After reading the ars technica article on the next generation Intel core I was thinking about branch prediction. Jump to the page on branch prediction.

One of the interesting new additions to the Intel core is the ability for the branch prediction engine to detect loops, and handle them appropriately. While it is interesting that they've added hardware to count the number of times a branch is taken a particular direction if one is dominate, it isn't really a complete solution. It should work well for relatively static branches, but that's not all the cases we need to worry about.

Data based branch prediction

In the case of almost all loops the break situation is derived from comparing results of values that are known at least a few instructions before the branch is executed. A truly accurate branch prediction should be able to determine which values are significant in this comparison, and start setting up a watch on these values ahead of time. Of course, it's tricky, they could be return values from functions, or counters, or a global memory address written by other processes. Not all of these can be optimized, but we should know more as these branches are used more and more.

I imagine this type of optimization is largely what Transmeta's Code Morphing software does.

With the size of memories and die sizes that are available to microprocessor designers today, this type of optimization should be possible, and perhaps implemented in the instruction decode phase of the processor. This would allow for the amount of optimization to scale depending on how the backend processor is handling the instructions. If the queue is backed up, optimize more. If we're running behind we can use less optimized code.

I think the last interesting thing that data based code optimization would allow is for performance enhancements for scripting engines. Most scripting engines turn the code into a data structure, and then process on that data structure. But, the processor is going to optimize the branches of the scripting engine, not the script it's running. While this is good, the script is going to drive the direction of most of the branches more than the placement of them in the code. If an optimization could look at the data (where the scripting engine was pulling from) to do the branch prediction so that the script itself would be optimized.

In the end, we have to find uses for all this silicon that Moore's law is giving us.


posted Apr 15, 2006 | permanent link