Modern processors read from memory or secondary cache in bursts. A burst (also known as line fill) is a series of 4 successive contiguous reads. The reason processors read in bursts and not in single accesses is that bursts are quicker (e.g. when a single read is 5 cycles, a burst read is 5+2+2+2= 11c, that is 11/4= 2.75 cycles per read). In this page we will assume that a burst is 5+2+2+2 = 11 cycles.
The size of each burst is fixed at 4 transfers of 8 bytes each, 32 bytes in total.
All cycles in this page are of a Pentium with VX/HX/TX chipset and EDO memory. Slightly different timings apply for other configurations. (The read accesses in this page have been simplified as 64 bit ones).
A' Conventional reading:
The conventional (normal) method of reading/searching/transfering is the sequential which is also recommended by Intel.
1' First read:
The first read is being performed. It takes 5 cycles.
The total time taken for the conventional method is 5+2+2+2+penalty cycles + a few cycles for the program's instructions to execute which results in a total of about 17 cycles per burst, which is about 117 Mbytes per second.
B' Innovative reading:
The unconventional method we discovered is non-sequential.
1' First read:
As in the conventional method, the first read is being performed. It takes 5 cycles.
2' Second read:
The second read is not done at the address 8, but at address 32, that is at the start of the next burst! Of course this second read can only be served after the whole current burst is finished, that is after 2+2+2= 6 cycles, and it takes 3 instead of 5 cycles for it to be performed, because on Intel's chipsets there's a special case: when a burst is initiated immediately after another one has finished, it is considered an extension of the previous one and takes only 3 cycles! With the conventional method, this cannot happen because of the processor's penalty. In this case there's no penalty because no access is made on the current burst. Total delay: 9 cycles.
3' Third, fourth and fifth reads:
Because the whole first burst is already loaded (and stored in primary cache) the remaining 3 reads incur no delay. It should be noted that at the same time the processor makes these 3 reads, it continues to load the 2nd burst, so there is no lost bus time, this method exploits the processor's parallelism to its maximum: all the instructions of the program run while the processor loads bursts from outside, that is the bus operates at it's maximum bandwidth!
The total time taken for the innovative method is 5+2+2+2+3 cycles =14 cycles for one full burst and the start of the next one. It is obvious that for burst 1 it takes 5+2+2+2=11 cycles and for burst 2 it will take 3+2+2+2= 9 cycles; that is 10 cycles on average per burst, which is exactly 200 Mbytes per second!
Comments and feedback are welcome.
For questions, go to the Q&A page.
Everything at this web site is the
property of Intelligent Firmware Ltd. You may not repost/publish this information
without our explicit permission.