Today's News: November 21, 2012
No news
Read Section 3.5
3.5: Dynamic Scheduling Algorithm
Tomasulo Algorithm Details
The 7 fields of a reservation station:
- Op: The operation to be performed
- Qj and Qk: The reservation station containing the source argument.
0 means the argument is is the corresponding V field.
- Vj and Vk: The value of a source argument. For loads, the Vk field hosts the offset.
- A: Address for memory address calculations. Initially set to the immediate field and then the effective address.
- Busy: set if busy
Today's News: November 26, 2012
Here is the form we filled in last time.
Figure 3.7 (empty) shows a blank form that can be filled in.
Figure 3.7 (filled) shows the result filled with the second load instruction waiting for memory.
Figure 3.7 (completed) shows the result filled after cycle 13 (with additional values in parentheses).
Today's News: November 28, 2012
A Loop Example
Consider the unrolled (but unscheduled) loop from November 12.
This has 4 iterations of the loop and takes 27 cycles.
How would this do under Tomasulo's algorithm?
Tomasulo Loop Form(empty) shows a blank form that can be filled in.
Here is an HTML version.
Assumptions:
- Enough reservation stations
- Load and store execution is address calculation which takes one cycle
- All cache hits: Memory access takes one cycle
- Floating point add takes 2 cycles of execution
- Floating point units are ready on cycle after execution ends
- Integer add takes one cycle of execution
- Priority for the CDB is based on issue time.
Here is a completed solution.
The unscheduled unrolled loop took 27 cycles.
The scheduled unrolled loop took 14 cycles.
What advantage does the tomasulo algorithm have over the scheduled unrolled loop?
Tomasulo Algorithm Summary
- Instruction issue requires only an available reservation station
- Instruction start execution requires:
- instruction has issued by not started execution
- functional unit available
- operands available
- all previous branches completed
- Memory access on loads may take multiple cycles (depending on cache hit)
- results written to CDB when available (at most one per cycle)