if Branch and cond: PC ← ALUOutput
otherwise PC ← PC + 4
Load:
LMD ← Mem[ALUOutput]
Store:
Mem[ALUOutput] ← B
WB
Load:
Regs[rt] ← LMD
RR ALU:
Regs[rd] ← ALUOutput
R-Imm ALU:
Regs[rt] ← ALUOutput
Question:
Figure C.21 Instruction Encoding
The RR instruction is described as:
RR ALU: Regs[rd] ← Regs[rs] funct Regs[rt]
What would have to change if instead it were:
RR ALU: Regs[rs] ← Regs[rt] funct Regs[rd]
Answer:
?
Figure C21 Values DADD R1, R2, R3
Question:
Suppose we have:
DADD R1, R2, R3
with R1=5, R2=7, and R3=9.
What is the top input to the Registers register file?
What is the second input (from the top) of the Registers register file?
What is the third input (from the top) of the Registers register file?
What is the fourth input (from the top) of the Registers register file?
What value is stored in the A register?
What value is stored in the B register?
What value is stored in the Imm register?
Which input is selected for the A Mux?
Which input is selected for the B Mux?
What is stored in ALU Output?
Which input is selected for the C Mux?
Which input is selected for the D Mux?
Pipelined Implementation
Figure C.22
shows a corresponding pipeline implementation.
The registers NPC, IR, A, B, Imm, cond, ALUOutput and LMD are now contained in the pipeline registers. Examples:
NPC is contained in which pipeline register? Answer:
NPC is created in IF so it it stored in the IF/ID register.
It is needed in EX, so it must be also be in ID/EX.
IR is stored in which pipieline registers? Answer:
Parts of the IR register are needed in each cycle, so for simplicity, the entire IR is propagated to each pipeline
register. This is somewhat inefficient.
Examples:
Figure C.23 shows the details of the pipelined execution for each type of instruction.
Below is a comparison for the RR ALU instruction. See
Figures C.21 and C.22
Operations that are performed, but not needed for this instruction are shown this way:
operation.
Stage
Unpipelined
Pipelined
IF
IR ← Mem[PC]
NPC ← PC + 4
IF/ID.IR ← Mem[PC] PC ← PC + 4
IF/ID.NPC ← PC + 4
ID
A ← Regs[IR.rs]
B ← Regs[IR.rt] Imm ← sign-extended(IR.Immediate)
The problem is that we might not know the branch address or if the branch is taken
until one or more additional instructions have been fetched, and possibly executed.
We are saved by the fact that the external state (what programs see) is not changed until MEM or WB.
In the unpipelined architecture shown in
Figure C.21:
the NPC stores the potential new PC during IF
the branch address and whether the branch is taken is computed in EX
the PC is updated in MEM
for a branch, the instruction is complete after the MEM cycle.
The the pipelined architecture shown in
Figure C.22 has
a 3-cycle stall when a branch is taken:
Suppose the instruction stream looks like:
instruction (not branch)
instruction (not branch)
instruction (not branch)
instruction A: taken branch
instruction B
instruction C
instruction D
...
instruction X: branch target
The PC is set at the end of IF to either PC+4 (normally) or if the Zero? field of EX/MEM is not 0 it is set to the ALU result
The Zero? field of EX/MEM stays 0 until the branch instruction is executed.
If the branch instruction is fetched in cycle i:
cycle i:
IF of A: taken branch is fetched
IF of A: branch instruction stored in IF/ID
IF of A: PC + 4 stored in PC (address of instruction i+1)
cycle i+1:
IF of B: instruction B at i+1 is fetched
IF of B: PC + 4 is stored in PC (address of instruction i+2)
ID of A: branch base register stored in ID/EX
ID of A: branch destination offset is stored in ID/EX
ID of A: branch instruction is stored in ID/EX (from IF.ID)
cycle i+2:
IF of C: instruction C at i+2 is fetched
IF of C: PC + 4 is stored in PC (address of instruction i+3)
ID of B: instruction B at i+1 is decoded
EX of A: branch instruction Zero? stored in EX/MEM (this is 1 since the branch is taken)
EX of A: branch destination stored in EX/MEM
cycle i+3:
IF of D: instruction D at i+3 is fetched
IF of D: branch destination is stored in PC (since Zero? field of EX/MEM is now set)
ID of C: instruction C at i+2 is decoded
EX of B: instruction B at i+1 is executed
Note that even if this is a branch, we do not want to set Zero?
MEM of A: nothing (for branch instruction)
cicle i+4:
IF: branch destination is fetched
Examples:
The timing diagram looks like this:
instruction
cycle i
cycle i+1
cycle i+2
cycle i+3
cycle i+4
cycle i+5
cycle i+6
cycle i+7
cycle i+8
instruction A (taken branch)
IF
ID
EX
MEM
WB
instruction B
IF
ID
EX
MEM
WB
instruction C
IF
ID
EX
MEM
WB
instruction D
IF
ID
EX
MEM
WB
instruction X (branch destination)
IF
ID
EX
MEM
WB
The PC is changed at the end each cycle and is either PC+4 or the ALU output depending on what is in the MEM/EX register which
was set on the previous cycle in MEM.
The branch instruction sets this in cycle i+3 so it affects the fetch in cycle i+4
We inhibit the MEM and WB actions in the next 3 instructions so the effect is that these are not executed.
Recall that there are 3 types of hazards: structural, data, and control.
Structural hazards will not occur because we included enough hardware.
The above discussion showed how to handle control hazards.
When a data hazard occurs, we need to either stall the pipeline, or elimintate the hazard by using forwarding.
forwarding hardware
Examples:
The following requires a stall of the DADD instruction:
LD R1, 45(R2)
DADD R5, R1, R7
This can be detected in the ID stage of the DADD instruction by comparing rt
of the LD instruction to rs and rt of the DADD instruction.
During the ID stage of DADD, rs is in IF/ID.IR.rs and rt is in IF/ID.IR.rt
During the ID stage of DADD, rt of LD is in ID/EX.IR.rt
The following data hazard in the DSUB instruction can be removed by forwarding:
LD R1, 45(R2)
DADD R5, R6, R7
DSUB R8, R1 R7
This can be detected in the EX stage of the DSUB by comparing the rt of the LD to the rs or rt of DSUB
In this case in the EX stage of DSUB, the ALU must be fed not from the ID/EX register but from the load result in MEM/WB.
Figure C.27
shows the new data paths needed and the new muxes for the ALU.