I was told by a customer that my game emulators did not display correctly on the Motorola Q9H smartphone. I got hold of a loaner device and discovered the problem. For some reason, the video memory is mapped in an odd way (reminiscent of the old Apple II). This odd RAM mapping means that many games/multimedia applications will not run correctly on the Q9H. I’ve written a workaround for my software, but the bigger question is how many devices/programs this affects. My best guess is that this is the fault of the new TI OMAP 2420 CPU. This chip has many 2D/3D capabilities, so it makes sense that the designers didn’t care much about how the video memory was mapped since most software would use the advanced capabilities of the chip and not write to VRAM directly. I’m in contact with my friend at TI and will report more information about this problem as soon as I know.
December 8, 2007
Posted by
bitbank |
arm, omap, smartphone, wince |
|
No Comments
“Micro_View” is a product I created several years ago for a client. It’s a simple imaging library for Win32 and WinCE which allows you to load BMP, GIF and JPEG images into a HBITMAP or it can display them in a window. The code is fast and small (the Win32 lib file is 96K). I created a stand-alone command-line driven executable which displays an image in a borderless window and a link library which has 3 functions defined:
int APIENTRY MicroView(TCHAR *filename, int iOptions);
int APIENTRY MVLoadBitmap(TCHAR *filename, HBITMAP *, HPALETTE *);
int APIENTRY MVLoadResource(HINSTANCE hInst, TCHAR *rname, HBITMAP *pBitmap, HPALETTE *pPalette);
If you need to add simple image handling to your application, it doesn’t get much easier than this. This is something that’s been collecting dust on my harddrive for quite a while, but would probably make a pretty good retail product. I will see if I can package this up into a reasonably priced product in the next few days. Please email me (bitbank@pobox.com) if you’re in need of such a library.
September 19, 2007
Posted by
bitbank |
arm, arm9, jpeg, omap, photo, tech, viewer, wince |
|
No Comments
WordPress (the company which hosts this site) collects some interesting statistical data on the people who visit the blog. To me, the most interesting data is a list of the search words which direct people to this site. Since I started including the JPEG and ARM keywords in my posts, I’ve seen a steady stream of people searching for basically the same thing: Free optimized source code for decoding JPEG/MPEG images on ARM devices. I’ve done such searches myself and have come to the conclusion that it’s not available. For anyone who has done research and invested tons of time and energy into writing optimized code, it is unlikely that they will be willing to give it away for free. There are plenty of open-source and free projects on the internet that are valuable and professionally done, but there usually comes a point in a project’s lifetime when the author commercializes it to get compensated for the time invested.
I try to share my knowledge and experience with the developer community; I understand the frustration of wasting precious time locating resources or coming up with workarounds for problems outside of (or within) my code. I also make a living writing software, and so I must write code which is worth compensation from my customers and maintain innovative solutions which compare well with my competition. The geek in me would love to have an open discussion about the fastest way to decode Huffman encoded data or minimize the calculations in the IDCT, but as a consultant, that would be self-defeating.
The “trade secrets” are visible in the source code, but hidden in the object code, so licensing object code will incur less risk to me and therefore cost considerably less. I’ve licensed my code to various companies for values ranging from several hundred dollars to tens of thousands. The price varies according to the risk and time required. Companies needing help with ARM optimization issues are encouraged to contact me. The amount I charge for my time or code is usually far more economical than having other programmers spend time trying to invent what I’ve already got working.
July 31, 2007
Posted by
bitbank |
arm, arm9, asm, assembly language, jpeg, omap, optimization, pocket pc, smartphone, xscale |
|
5 Comments
I thought it would be useful to re-run the tests with the C version of my JPEG code. From the results it appears that memory bandwidth is the real limiting factor to the speed and the pixel colorspace conversion gets the most benefit from my optimized ARM assembly language. Also it appears that the OMAP gains more from optimized ASM than the XScale does. Here are the numbers:
C-Code:
PPC: thumbnail: 10.7 milliseconds, DC only: 968 milliseconds, full res: 3734 milliseconds.
SP: thumbnail: 25.1 milliseconds
Mixed C and ASM
PPC: thumbnail: 8.8 milliseconds, DC only: 830 milliseconds, full res: 2700 milliseconds.
SP: thumbnail: 15.1 milliseconds
The load times for the “DC only” and “full res” tests include the time taken to read 4.3MB of data from RAM through the WinCE file system.
These results make sense in that the real benefit of optimization comes from fixing the algorithms and reducing memory usage. The optimized ARM assembly code is certainly helpful in speeding things up, but won’t offer an order of magnitude improvement over what the compiler generates.
July 11, 2007
Posted by
bitbank |
arm, arm9, asm, assembly language, benchmark, jpeg, omap, optimization, performance, pocket pc, smartphone, wince, xscale |
|
No Comments
The great thing about the ARM architecture is that the more I look at a piece of code, the more ways I find to optimize it. The conditional execution, barrel shifter and optional setting of the processor flags create many opportunities for optimization. I’ve spent some more time optimizing my ARM asm JPEG code and now have some hard numbers to publish. I used a HP iPAQ h2210 Pocket PC (400Mhz PXA255) and a HTC Hurricane SmartPhone (195Mhz OMAP 850) to do the testing. I was able to load the file from RAM on the Pocket PC (to reduce file I/O delays), but not on the SmartPhone. The SmartPhone file system does not use RAM for file storage. The slow speed of reading from the miniSD card overtakes the amount of processing time in the tests, so the only test that was run on the SmartPhone was decompressing a 160×120 thumbnail image in RAM. All tests were to decompress the image to a RGB565 bitmap. The thumbnail test decompresses the 160×120 EXIF thumbnail image. The “DC only” test creates a single pixel from each MCU (the 3072 x 2304 image is loaded as 384×288). The “Full res” test decompresses every pixel of the image.
PPC: thumbnail: 8.8 milliseconds, DC only: 830 milliseconds, full res: 2700 milliseconds.
SP: thumbnail: 15.1 milliseconds
The speed difference between the two devices is to be expected considering the different processor and memory bus speeds. The “DC only” test is useful because it shows the relative speed of Huffman decoding. The file size is 4.3MB, so in 830 milliseconds the code was able to decode all of the MCUs and produce a single pixel from each one.
I’ve uploaded the sample image to my web server here: CIMG2209.JPG
The image was taken with a Casio EX-Z750 and depicts a relatively complex scene with many fine details. Like most cameras, the Elixim series saves JPEG images with 2:1 horizontal color subsampling (when set to maximum quality). It’s not unreasonable for a point-and-shoot camera like the Z750 to save images at a less than optimal compression because the image coming off the CCD isn’t that great to begin with. What irks me is that cameras like the Canon 20D do the same thing. With a good SLR lens and imager, the Canon should allow you to save full res color JPEG images.
Comments?
July 7, 2007
Posted by
bitbank |
arm, arm9, asm, assembly language, benchmark, jpeg, omap, optimization, performance, photo, pocket pc, smartphone, viewer, wince, xscale |
|
4 Comments
An odd title considering that JPEG is a cryptic image compression standard. My idea of fun is optimizing code until there’s nothing left to improve. I decided a few weeks ago to take the plunge and rewrite the 3 core JPEG decode routines to speed up my imaging code. One reason was that the great majority of cell phones today are based around the TI OMAP architecture typically running at around 200Mhz. These devices seem slow at working with images, so I thought I could help that situation by speeding things up to improve both battery usage and the user experience.
The important, “inner loop” routines of JPEG image decoding are the Huffman decoding of the MCU (minimum coded unit), the IDCT (inverse discrete cosine transform), and the output stage (turning the YCrCb pixels into RGB pixels). All 3 routines together turned out to be only a couple hundred lines of ARM code, but the result of rewriting it from C was quite dramatic. The original C code has been optimized and tested over a long period of time and was in good shape to begin with, but C isn’t so great at bit manipulation and squeezing the most use out of register variables. It took several iterations to get down to the bare minimum of code, but I’m quite happy with the results. I used ARMV5 instructions, but made sure that the code performs well on both OMAP and XScale CPUs (unlike Intel’s integrated performance primitives). Luckily my previous performance testing of the multiply instructions helped guide me to save a few clock cycles off of several routines. The purpose of this work is threefold:
1) I’m readying a new version of my imaging application (PQV - Pocket QuickView) for Windows Mobile and need it to be competitive with other products. I pride myself on having the fastest viewer available.
2) I have been staring at the C code for a long time and wondering how much better it could perform if written in optimized ARM asm.
3) I believe this code has value to anyone doing imaging or video on ARM based devices. Web browsers, image viewers, camera applications, video players can all benefit from this code.
I’ve been searching for the past week or so for customers of this code, but the typical response is the “not invented here” attitude standing in the way of improving products.
I will post some sample images and benchmarks shortly to back up my claims of fast JPEG decoding.
Anyone interested in licensing object or source code should contact me directly (bitbank@pobox.com).
June 21, 2007
Posted by
bitbank |
arm, arm9, asm, assembly language, benchmark, jpeg, omap, optimization, performance, pocket pc, smartphone, wince, xscale |
|
No Comments
Typically when writing code in Windows (desktop or mobile), you will encounter a strange bug in the operating system which requires a workaround. This has been occuring since the very first version of Windows and is still happening today. Lately I have been having odd timing problems with my games on some devices and was trying to discover the reason behind it. On OMAP devices (both PPC + SP) the PerformanceCounter doesn’t really exist and it returns a resolution of 1ms. Under the covers it uses the system tick counter which has a resolution of 1 millisecond. This would be reasonable if it worked as expected, but for some reason, querying the performance counter gives stops and starts and will cause games which depend on this to go both too fast and too slow. The solution is to check the timer frequency with QueryPerformanceFrequency() and if it comes back as 1000, then use the GetTickCount() function instead of the PerformanceCounter functions.
May 23, 2007
Posted by
bitbank |
arm, omap, pocket pc, smartphone, wince, xscale |
|
1 Comment