Larry’s Personal & Tech ramblings

Just another WordPress.com weblog

Having fun with JPEG decompression

An odd title considering that JPEG is a cryptic image compression standard.  My idea of fun is optimizing code until there’s nothing left to improve.  I decided a few weeks ago to take the plunge and rewrite the 3 core JPEG decode routines to speed up my imaging code.  One reason was that the great majority of cell phones today are based around the TI OMAP architecture typically running at around 200Mhz.  These devices seem slow at working with images, so I thought I could help that situation by speeding things up to improve both battery usage and the user experience.

The important, “inner loop” routines of JPEG image decoding are the Huffman decoding of the MCU (minimum coded unit), the IDCT (inverse discrete cosine transform), and the output stage (turning the YCrCb pixels into RGB pixels).  All 3 routines together turned out to be only a couple hundred lines of ARM code, but the result of rewriting it from C was quite dramatic.  The original C code has been optimized and tested over a long period of time and was in good shape to begin with, but C isn’t so great at bit manipulation and squeezing the most use out of register variables.  It took several iterations to get down to the bare minimum of code, but I’m quite happy with the results.  I used ARMV5 instructions, but made sure that the code performs well on both OMAP and XScale CPUs (unlike Intel’s integrated performance primitives).  Luckily my previous performance testing of the multiply instructions helped guide me to save a few clock cycles off of several routines.  The purpose of this work is threefold:
1) I’m readying a new version of my imaging application (PQV - Pocket QuickView) for Windows Mobile and need it to be competitive with other products.  I pride myself on having the fastest viewer available.
2) I have been staring at the C code for a long time and wondering how much better it could perform if written in optimized ARM asm.
3)  I believe this code has value to anyone doing imaging or video on ARM based devices.  Web browsers, image viewers, camera applications, video players can all benefit from this code.

I’ve been searching for the past week or so for customers of this code, but the typical response is the “not invented here” attitude standing in the way of improving products.

I will post some sample images and benchmarks shortly to back up my claims of fast JPEG decoding.

Anyone interested in licensing object or source code should contact me directly (bitbank@pobox.com).

June 21, 2007 Posted by bitbank | arm, arm9, asm, assembly language, benchmark, jpeg, omap, optimization, performance, pocket pc, smartphone, wince, xscale | | No Comments