itix is right. Algorithm optimization is way more important than assembly optimizations. Sure, these are important too, but only when the compiler doesn't do a good job there, eg. with ppc instructions like eg. cntlzw which doesn't have a C equivalent -ie. no C instruction/group of instructions get translated to this asm instruction, or when the compiler messes up scheduling -not very often, but it happens. Otherwise, it's just a waste of time just to achieve a 2-5% speed increase.