AGP Writes and Compiler Optimizations
Whenever you write into a vertex or index buffer, it is every
likely that you are directly accessing the AGP memory. You will
probably know that you should write in sequential order.
This is truely important, even exchaning two DWORD's can half
your performance.
I had to find that out the hard way when i wrote an inner
loop like this:
sInt x,z;
sU32 col;
sF32 *fp;
BlaVertex *v;
[...]
for(z=0;z<=LS_BATCHVERTS;z++)
{
for(x=0;x<=LS_BATCHVERTS;x++)
{
fp[0] = px+x*sx;
fp[1] = (v->HD+br->Data->Base)*ScaleH;
fp[2] = pz+z*sz;
((sU32 *) fp)[3] = col;
fp+=4;
v++;
}
}
I expected this to perform well but it didn't. When I looked at the
assembly code, I found that compiler (VC++) decided to reschedule the
writes: It wrote the color before the z component to save a cycle
somewhere.
Fortunatly, declaring the write pointer as volatile
solved the problem. This tells the compiler that every read or
write access to the memory the pointer points to must occur exactly
as specified, with respect to other volatile access. This does not
mean that the pointer variable itself is excluded from optmization, things
like fp+=4; work as before.
sInt x,z;
sU32 col;
volatile sF32 *fp;
BlaVertex *v;
[...]
for(z=0;z<=LS_BATCHVERTS;z++)
{
for(x=0;x<=LS_BATCHVERTS;x++)
{
fp[0] = px+x*sx;
fp[1] = (v->HD+br->Data->Base)*ScaleH;
fp[2] = pz+z*sz;
((volatile sU32 *) fp)[3] = col;
fp+=4;
v++;
}
}
|