slow floating point on Pentium 4


I have performance issues with the following code:

mulsd %xmm1, %xmm0
addsd %xmm2, %xmm0
add 1, %eax
cmp 10000000, %eax

basically: for (i = 0; i < 10000000; ++i) { a = a * b + c; }

when c != 0 it works 20 times faster than when c == 0.0

Does anybody have any ideas on what the problem is?




  • I'm not familiar with those specific instructions on CPU's, but there are a couple of general things to keep in mind.

    On modern hardware, at least MUL & DIV are usually processed iteratively in the microcode and use "exit early" algorithms, so the amount of time it takes to process will vary depending on exactly what the input values are. In addition, when using floating point numbers, numbers are rounded and truncated, so "0.0" may not be precisely 0.0. I don't know what your particular application involves, but if you're worried about speed you should not use floating point numbers when integers will work just as well.

    IOW, I'm not surprised that the timing is variable, and a factor of 20 may not be all that unreasonable.
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


In this Discussion