CS 3853 Computer Architecture Chapter 3 Section 5 Loop Example



Instruction
Reservation
Station
Execution
Unit
Issue
Cycle
Ex Start
Cycle
Ex End
Cycle
Memory
Cycle
CDB
cycle
Write
dest
1.L.D    F0,0(R1)Load1ALU12234F0
2.ADD.DF4,F0,F2Add1fadd2567F4
3.S.DF4,0(R1)Store1ALU 34 4 8 0(R1)
4.L.D    F6,-8(R1)Load2ALU45568F6
5.ADD.DF8,F6,F2Add2fadd591011F8
6.S.DF8,-8(R1)Store2ALU 67 7 12 or 13-8(R1)
7.L.D    F10,-16(R1)Load3ALU788910F10
8.ADD.DF12,F10,F2Add3fadd8111213F12
9.S.DF12,-16(R1)Store3ALU 9101014-16(R1)
10.L.D    F14,-24(R1)Load4ALU10111112 or 1314F14
11.ADD.DF16,F14,F2Add4fadd11151617F16
12.S.DF16,-24(R1)Store4ALU 12131318-24(R1)
13.DADDUIR1,R1,#32Iadd1ALU13141415R1
14.BNER1,R2,loopBranchALU141616
Note 1: Instruction 4 is ready with its result on cycle 7, but the CDB is busy.
Note 1a: Instruction 10 is ready with its result on cycle 13, but the CDB is busy.
Note 1b: Instructions 6 and 10 are accessing data memory on the same cycle.
Whether we give preference by issue time or for reads over writes, instruction 10 puts its result on the CDB on cycle 14, since the CDB is not available on cycle 13.
Note 2: The branch cannot execute until R1 is available, after cycle 15.
Note 3: We delay executing instructions until the branch completes, so the next instruction can start execution at cycle 17.
This is equivalent to issuing at 16, so the loop effectively takes 15 cycles, with a cycles per iteration of 3.75.
If the branch is taken, we will need to refetch the next instruction in cycle 17, so the 4 iterations take one cycle longer, for and average of 4.00.
Compare this with the summary from section 2:
DescriptionCycles per iteration
ideal5
original9
scheduled7
unrolled6.75
unrolled and scheduled3.5