ARM Multiply performance pt. 2
I wanted to revisit the multiply test because I hadn’t tested the difference between 32×32 and 16×16 multiplies. On the XScale PXA255 and above, both 32×32 and 16×16 multiplies take 1 clock cycle. On the OMAP 850 (and probably other OMAP’s based on the ARM9 core), the 16×16 multiply takes 1 clock and the 32×32 takes 2. Useful to know if your code will be running on the OMAP and you really only need a 16×16 multiply.
L.B.