Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.
Loop Unrolling Optimization Question
I have two versions of an iterative procedure, one that has rolled-up loops and one that is unrolled. The computation involves three nested loops. By my calculations, the rolled-up version requires 90,460 cycles, whereas the unrolled loop version has a calculated latency of 56,400 cycles.
However, in real execution, the rolled version is running 2-2.5 times faster. My question is why is the rolled version, with the higher latency, executing faster? Am I getting 'beaten' by instruction caching?
For example, the rolled version has much short code, and each time through the loops, the instructions are the same....only register contents change. For the unrolled version, the actual memory references are changing, so the much longer code cannot be cached.
By the way, this is running on a 1200 MHz Athlon (Thunderbird), with ABIT KT7E mainboard.
Any comments regarding this optimization issue are welcome....if it is a caching issue, how much 'unrolling' can be done before caching beats the unrolling?
0 · ·