Larry’s Personal & Tech ramblings

Just another WordPress.com weblog

ARM Multiply performance pt. 2

I wanted to revisit the multiply test because I hadn’t tested the difference between 32×32 and 16×16 multiplies.  On the XScale PXA255 and above, both 32×32 and 16×16 multiplies take 1 clock cycle.  On the OMAP 850 (and probably other OMAP’s based on the ARM9 core), the 16×16 multiply takes 1 clock and the 32×32 takes 2.  Useful to know if your code will be running on the OMAP and you really only need a 16×16 multiply.

L.B.

May 18, 2007 Posted by bitbank | arm, arm9, asm, assembly language, benchmark, optimization, performance, pocket pc, smartphone, tech, xscale | | 2 Comments