CS 3853 Computer Architecture Chapter 3 Section 5 Loop Example



Instruction
Reservation
Station
Execution
Unit
Issue
Cycle
Ex Start
Cycle
Ex End
Cycle
Memory
Cycle
CDB
cycle
Write
dest
1.L.D    F0,0(R1)Load1ALU12234F0
2.ADD.DF4,F0,F2Add1fadd2567F4
3.S.DF4,0(R1)Store1ALU 34 4 8 0(R1)
4.L.D    F6,-8(R1)Load2ALU45569F6
5.ADD.DF8,F6,F2Add2fadd5101112F8
6.S.DF8,-8(R1)Store2ALU 67 7 13-8(R1)
7.L.D    F10,-16(R1)Load3ALU788910F10
8.ADD.DF12,F10,F2Add3fadd8111214F12
9.S.DF12,-16(R1)Store3ALU 9101015-16(R1)
10.L.D    F14,-24(R1)Load4ALU1011111216F14
11.ADD.DF16,F14,F2Add4fadd11171819F16
12.S.DF16,-24(R1)Store4ALU 12131320-24(R1)
13.DADDUIR1,R1,#32Iadd1ALU13141417R1
14.BNER1,R2,loopBranchALU141818
Note 1: Instruction 4 is ready with its result on cycle 7, but the CDB is busy. At cycle 8, Instruction 3 has its result ready and puts it on the CDB. Instruction 4 has to wait until cycle 9.
Note 2: The branch cannot execute until R1 is available, after cycle 18.
Note 3: We delay executing instructions until the branch completes, so the next instruction can start execution at cycle 19.
This is equivalent to issuing at 18, so the loop effectively takes 17 cycles, with a cycles per iteration of 4.25.
Compare this with the summary from section 2:
DescriptionCycles per iteration
ideal5
original9
scheduled7
unrolled6.75
unrolled and scheduled3.5