Here’s something I’d like to share with you… It might sound a bit silly to some, but perhaps there are some points you recognize from your own experience.
It all started about 10 years ago, when I was still in university. I was writing some basic graphics-related code in my spare time, as a hobby. I wanted some libraries for loading images, such as GIF and JPG. These were built-in in Java (which I used a lot at the time) but for my native DOS/Windows stuff, I relied only on BMP, which was rather inconvenient. I figured it would be a good learning experience to write my own decompressors for these file formats.
I decided to start with GIF, because it was the simpler of the two. After some initial struggles (partly from a misinterpretation of the direction in which the bits were packed into the bytes), I came up with some working code. I then proceeded to optimize it, and found some tricks that most implementations seemed to lack. The result was a pretty small and elegant piece of code which was very fast (I think the entire decoder could fit in 2k).
After that, I started on the JPG format. It started out reasonably well… Parsing the huffman data, quantization tables and all that, that wasn’t all that difficult. I also used a cool optimization for the huffman, by using a bruteforce lookup table. But somewhere between the iDCT and the YCbCr-to-RGB conversion, something went wrong… or so it seemed.
Now the thing with JPG is… as soon as something goes wrong, it REALLY goes wrong. One wrong bit somewhere can turn everything that comes next into completely unrecognizable junk. I had quite a few unknowns in my routine as well. E.g. what range of values am I *supposed* to be getting after the quantization stage? And then how do iDCT and YCbCr affect this? What input/output range do they expect? You’re no longer sure where things are right, and where things go wrong. I hate code that does that, I like to take it one step at a time, and verify each step as I go along. I had the same thing with the Marching Cubes implementation as well… You need to completely implement all steps of the algorithm, and if it doesn’t work in one go, chances are you’re getting.. unrecognizable junk. Which is very hard to debug.
Since I got stuck with my own code, I figured I’d download the sourcecode of the Independent JPEG Group implementation and study that. Thing is, that sourcecode was incredibly terse, and it was a drag to even get it to compile to something useful on a Windows machine in the first place… let alone to modify it to output data in a format that can be compared to my own decompressor, so I could try to pinpoint the problem in my own code.
After fighting with the code for a while, I got demotivated, and lost interest. But somehow the experience was rather depressing. I started to doubt myself. Was I not smart enough to understand the JPG format and write a decompressor? Was this a bridge too far? In a way it’s silly, when you’re in university… that fact alone should mean that you have the required intellectual capacity for this sort of thing… but well, when you can’t pull it off… In a way I could never really get rid of that experience, this project was always “the one that got away”, the project that “beat me”.
Some years later, when I was still in university, JPG crossed my path yet again. A friend of mine had bought a Canon PowerShot digital camera. These camera’s had a “homebrew” scene since they ran on ROMDOS on a 80186-compatible processor, and custom applications could be installed on it. One of these custom applications was a histogram feature, which was not present on the cheaper models (the histogram shows the balance of different levels of lighting, allowing you to check the level of under/over-exposure). The histogram was rather slow, it took about 10 seconds to analyze a single image.
So my friend decided to study the sourcecode and see if he could make it go faster. Since the application basically opened the last JPG image on the camera, decompressed it, and analyzed it, the performance was largely dictated by the JPG decompressor. He asked me to take a look at it as well, as I had been working on JPG before.
We changed two things about the JPG decoder… The first thing was to use a bruteforce lookup table for the huffman codes, which greatly improved performance. The second thing was to fine-tune the iDCT mathematics so it would completely fit in 16*16-bit multiply operations (being a 16-bit CPU, which can’t natively handle 32-bit numbers). In the end, the time went down to about 2-3 seconds per image, a huge improvement.
It gave me back some confidence. At least I knew that I did understand most of the JPG algorithm correctly, and apparently some of the code I’ve developed was much better than this particular code. Also, with the histogram I had some actual working JPG code (albeit only the Y-channel, it skipped the Cb and Cr components since the histogram only needed luminance information). I gave it another try, but I still couldn’t find the problem, so I gave up again after a while… and haven’t gone back to the code since…
Until a few days ago, that is. I wanted to give it one more shot. After all these years, with all the experience I now have… perhaps I would finally be able to pull it off. It was fun to go back through this old code… It was actually remarkably easy to follow, even though I’ve written most of it about 10 years ago, and hadn’t touched it in 6 years at least, I think. Funny enough it was still vanilla C code, no C++. Back in those days I didn’t do C++ yet I suppose. Going back from C++/C#/Java to C and its limited variable declaration and such is a bit annoying. There were some things that I probably wouldn’t do today, but I figured I’d keep the code as-is for the most part, and just try to fix it first.
I used the histogram code alongside my own, and this time I finally found what the problem was… It wasn’t even a very big problem actually. I just forgot to handle a special case where it outputs a code with length 0. Apparently it never occured to me that this was possible. So I tried to read and decode 0 bits from the bitstream… which didn’t work, as my code used a shift right by (32-length). Shifting by 32 on a 32-bit integer doesn’t work, at least, not on x86.
Okay, there were some other issues with the code that came after it, but some of them were quite easy to fix… and before long, I was able to decode the Y-channel of the JPG, and I could produce a correct grayscale image. It took a bit of puzzling to also get the full RGB image, but I got it in the end. It’s silly really… Apparently I did understand the JPG algorithm correctly, the actual bug that messed things up was not in any of the math-related stuff… no, it was some esoteric case in some bit-fiddling stuff.
So to be honest, I have mixed feelings about it. I feel relieved that most of my code was actually correct, and my understanding of JPG had been good all along. And JPG didn’t beat me in the end, I beat the JPG bug. But on the other hand it’s a bit sad that this one bug has caused me so much trouble, and it’s taken so long to get it resolved. I suppose I should remind myself that the Marching Cubes implementation was quite difficult to get right as well, but I didn’t give up there. I suppose it’s easy when you know how. Sadly, I no longer need the JPG loading routine. So I don’t think I’ll spend a lot of time on perfecting the routine. But it’s a mental victory to have it working at last.
I had always thought that JPG was a very nice compression scheme, and I still do… and at the time of its release I don’t think there was anything like it. Even today, many audio and video compression schemes apply quite a few of the JPG ideas.