ARM Graphics optimization
Over the next few days (weeks), I’m going to profile various graphics routines running on the ARM to see which gives the best performance. Specifically, I’m going to compare the speed of converting planar data into chunky data with lookup tables versus brute force calculations. Also I will examine the time to render graphics from planar (e.g. 2/4 bpp) sources versus pre-converted 8bpp chunky sources (which use 2 or 4 times the memory). The relevance of this is to help speed up Nintendo game emulation. The GBC, NES, and SNES all store their sprite and tile data in planar format, but this must be converted to chunky format to work with most display memory. If anyone already has some insight on this, please feel free to comment.
From my experiences 8bpp is still a waste of precious cache for NES. What I did in PocketNester is converting planar data to 4 bit pixels or 2 pixels per byte.
For my NES code, I convert the planar graphics to 8bpp on the fly and the reason I use 8bpp is because of sprite priority so that I can draw everything together.