Larry’s Personal & Tech ramblings

Just another WordPress.com weblog

ARM Multiply Performance

Someone asked me to do some testing of the performance of the ARM multiply instruction.  I hadn’t included it in my previous performance tests because it didn’t occur to me; I don’t use it in the inner loops of game emulators.

I decided to see if there was a difference in performance when working with different data values (e.g. multiplying by zero) and on the XScale vs. OMAP CPUs.  The firt test showed that there is no difference in the performance when working with zero and non-zero data.  The second test showed that the XScale has a much faster implementation of multiply than the OMAP.  On my 400Mhz PXA255 handheld, my tests showed that the unsigned multiply instruction (MUL) takes just 1 clock cycle, but on the OMAP 850 (used in many SmartPhones) it takes 2 clocks.  I haven’t tested the 32×32 multiply because it’s in the ARM5 instruction set and the VS2005 C compiler generates ARM4 compatible code.

March 26, 2007 Posted by bitbank | arm, arm9, asm, assembly language, benchmark, optimization, optimização, performance, pocket pc, smartphone, tech | | 2 Comments