Larry’s Personal & Tech ramblings

Just another WordPress.com weblog

ARM Multiply Performance

Someone asked me to do some testing of the performance of the ARM multiply instruction.  I hadn’t included it in my previous performance tests because it didn’t occur to me; I don’t use it in the inner loops of game emulators.

I decided to see if there was a difference in performance when working with different data values (e.g. multiplying by zero) and on the XScale vs. OMAP CPUs.  The firt test showed that there is no difference in the performance when working with zero and non-zero data.  The second test showed that the XScale has a much faster implementation of multiply than the OMAP.  On my 400Mhz PXA255 handheld, my tests showed that the unsigned multiply instruction (MUL) takes just 1 clock cycle, but on the OMAP 850 (used in many SmartPhones) it takes 2 clocks.  I haven’t tested the 32×32 multiply because it’s in the ARM5 instruction set and the VS2005 C compiler generates ARM4 compatible code.

March 26, 2007 - Posted by bitbank | arm, arm9, asm, assembly language, benchmark, optimization, optimização, performance, pocket pc, smartphone, tech | | 2 Comments

2 Comments »

  1. VS 2005 can generate code for ARM5, the switch is under c/c++->advanced->compile for architecture. Just select ARM5 or ARM5T. It will be great if you can compare the performance based on ARM5 code.

    Comment by Bill | April 1, 2007

  2. Bill,
    I used assembly language to test the multiply instruction, not C. The ARM5 uses the same multiply as ARM2-4, so it would not make any difference in performance.

    L.B.

    Comment by bitbank | April 1, 2007

Leave a comment