ARM JPEG Benchmarks
The great thing about the ARM architecture is that the more I look at a piece of code, the more ways I find to optimize it. The conditional execution, barrel shifter and optional setting of the processor flags create many opportunities for optimization. I’ve spent some more time optimizing my ARM asm JPEG code and now have some hard numbers to publish. I used a HP iPAQ h2210 Pocket PC (400Mhz PXA255) and a HTC Hurricane SmartPhone (195Mhz OMAP 850) to do the testing. I was able to load the file from RAM on the Pocket PC (to reduce file I/O delays), but not on the SmartPhone. The SmartPhone file system does not use RAM for file storage. The slow speed of reading from the miniSD card overtakes the amount of processing time in the tests, so the only test that was run on the SmartPhone was decompressing a 160×120 thumbnail image in RAM. All tests were to decompress the image to a RGB565 bitmap. The thumbnail test decompresses the 160×120 EXIF thumbnail image. The “DC only” test creates a single pixel from each MCU (the 3072 x 2304 image is loaded as 384×288). The “Full res” test decompresses every pixel of the image.
PPC: thumbnail: 8.8 milliseconds, DC only: 830 milliseconds, full res: 2700 milliseconds.
SP: thumbnail: 15.1 milliseconds
The speed difference between the two devices is to be expected considering the different processor and memory bus speeds. The “DC only” test is useful because it shows the relative speed of Huffman decoding. The file size is 4.3MB, so in 830 milliseconds the code was able to decode all of the MCUs and produce a single pixel from each one.
I’ve uploaded the sample image to my web server here: CIMG2209.JPG
The image was taken with a Casio EX-Z750 and depicts a relatively complex scene with many fine details. Like most cameras, the Elixim series saves JPEG images with 2:1 horizontal color subsampling (when set to maximum quality). It’s not unreasonable for a point-and-shoot camera like the Z750 to save images at a less than optimal compression because the image coming off the CCD isn’t that great to begin with. What irks me is that cameras like the Canon 20D do the same thing. With a good SLR lens and imager, the Canon should allow you to save full res color JPEG images.
Comments?
Dear Mr. Larry,
I have written a JPEG decoder on ARM and it decodes a 320×240 pixel image in 1400 msecs. I tried to decrease the speed by using assembly language written functions for the IDCT part and it reduced to 1260 msecs but the clarity of the image decreased. That is to say it was still clearly visible but the contrast decreases slightly and the sharpness decreases and at places like the edge of the nose or edge of trouser there is a black, or white line respectively. could you please tell me what i am doing wrong and how I can improve the speed. Would you be willing to look at my code for the idct routine??
My C code can decode a 320×240 image in less than 200ms on a 195Mhz ARM, so obviously 1400ms is not due to C vs. optimized ASM. There are many trade secret algorithmic inventions in my code which allow it to work so quickly. Since I’m in the business of selling code, giving away my secrets would not be a good business move. Please contact me if you would like to discuss licensing my code.
195MHz arm what? arm 926, and with or without the mac hardware (EJ-S). i am trying to find code benchmarked on the cortex core, or figure out how many MIPS required for that.
db
As stated in my article, I used a 195Mhz TI OMAP 850 for the testing. Please refer to TI’s documentation as to what type of ARM core the OMAP 850 uses.