Has Intel documented this flaw?
Recently an Intel spokesman claimed that this performance flaw has been documented 'years ago', and that it can be found on page 43 (3.5.3) of Intel's Architecture Optimizations Manual.
On this page, Intel talks about the line fill (=burst) order of its processors. The only somewhat relevant thing there is the middle paragraph which states that on a burst the requested word (8 bytes) will be the first returned and afterwards the other 3 words will follow. Which is exactly what we said Intel has in its data sheets.
BUT in the same paragraph it is stated also that "it is preferable to access memory in sequential order". Exactly that generates the flaw: when accessing memory or secondary cache sequentially, the processor's read buffer imposes a penalty resulting in the bus being under-utilized. We strongly disagree: we access memory in a special non-sequential order which results in up to a 71% increase of main memory bandwidth!
But what Intel does not reveal in this page or anywhere else, is that if one makes afterwards (while the burst has not finished) a request of another word in the same burst line, then the execution unit waits for the entire burst to finish (documented) and then there is a considerable time penalty (the undocumented flaw).  But if while the burst has not finished another read request is made in a different burst line, there is no penalty; immediately after the current burst, a new one is generated. Consequently, in order to get rid of this penalty the workaround is to rearrange the order of read requests, that is to make them in a special order, non-sequentially.
For most people, this performance flaw/issue/'design decision'/imperfection/etc is more important than all these Pentium bugs that were found in the past 3 years. For instance, the worst bug, the FDIV bug, occurred only on 1 in 9 billion double precision divisions and then only returned a slightly wrong result. Intel correctly stated that this bug affected only a small minority (at most 1-2%) of its customers. But this performance issue affects everybody all the time, because everybody would have a faster processor if the read buffer didn't have this penalty, or if programmers, compiler makers, etc. were aware of that.

We wish to state that we have nothing against Intel, we always admired Intel's new processors, and we use mostly Intel processors. We just wish to show programmers how they can make their programs much faster by working around an undisclosed flaw (or worst a 'design decision') which exists in all manufacturer's processors. But we generally believe that there should be no secrecy; what Intel, as the market leader, decides to put in its new processors everybody who is expected to buy/use/program them should be allowed to know.

Click here for Schematic demonstration of the read buffer flaw

If you can handle it, proceed to in depth technical analysis.
Return to first page.
For questions, go to the Q&A page.
For comments or suggestions, mail us

Intel and Pentium are registered trademarks of Intel corporation.
All other trademarks are those of their respective owners.

Everything at this web site is the property of Intelligent Firmware Ltd. You may not repost/publish this information without our explicit permission.