previous
 next 
CS 3853 Computer Architecture Notes on Appendix C Section 3

Read Appendix C.3

C3: Pipeline Implementation

We start with a simple unpipelined implementation of a subset of the MIPS instructions.

Unpipelined Implementation

We consider the following 5 types on instructions: The following information is from Figure A-22.
All instructions are 32 bits and these instructions have one of 2 formats:
I-type:
Figure A.22-I
Used for:
R-type:
Figure A.22-R
Used for:

Today's News: February 12, 2013
No news yet.

Examples:
Figure C.21 shows the hardware needed to implement these instructions in 5 or fewer cycles.
Here is what happens at each cycle:
Question:
The RR instruction is described as:
RR ALU: Regs[rd] ← Regs[rs] funct Regs[rt]
What would have to change if instead it were:
RR ALU: Regs[rs] ← Regs[rt] funct Regs[rd]
Answer:
?

Pipelined Implementation

Figure C.22 shows a corresponding pipeline implementation.
The registers NPC, IR, A, B, Imm, Cond, ALUOutput and LMD are now contained in the pipeline registers.
Examples:
  1. NPC is contained in which pipeline register?
    Answer:
    NPC is created in IF so it it stored in the IF/ID register.
    It is needed in EX and MEM, so it must be in all pipeline registers up to MEM, so it is also stored in ID/EX and EX/MEM.
  2. IR is stored in which pipieline registers?
    Answer:
    Parts of the IR register are needed in each cycle, so for simplicity, the entire IR is propagated to each pipeline register. This is somewhat inefficient.

Today's News: February 14, 2013
No news yet.

Examples: Figure C.23 shows the details of the pipelined execution for each type of instruction.

Below is a comparison for the RR ALU instruction. See
Figures C.21 and C.22
Operations that are performed, but not needed for this instruction are shown this way: operation.
StageUnpipelinedPipielined
IF IR ← Mem[PC]
NPC ← PC + 4
IF/ID.IR ← Mem[PC]
PC ← PC + 4
IF/ID.NPC ← PC + 4
ID A ← Regs[IR.rs]
B ← Regs[IR.rt]
Imm ← sign-extended(IR.Immediate)
ID/EX.A ← Regs[ID/IF.IR.rs]
ID/EX.B ← Regs[ID/IF.IR.rt]
ID/EX.NPC ← IF/ID.NPC
ID/EX.IR ← ID/ID.IR
ID/EX.Imm ← sign-extended(IF/ID.IR.Immediate)
EX ALUOutput ← A funct B EX/MEM.IR ← ID/EX.IR
EX/MEM.ALUOutput ← ID/EX.A funct ID/EX.B
MEM PC ← PC + 4 MEM/WB.IR ← EX/MEM.IR
MEM/WB.ALUOutput ← EX/MEM.ALUOutput
WB Regs[IR.rd] ← ALUOutput Regs[MEM/WB.IR.rd] ← MEM/WB.ALUOutput
Note: My notation is slightly different from that of the book.
For the unpipelined case I use IR.rs instead of just rs, etc.
For the pipelined case I use XX/XX.IR.rs instead of XX/XX.IR[rs]

How Branches Work

Branches are hard. In the unpipelined architecture shown in Figure C.21: The the pipelined architecture shown in Figure C.22 has a 3-cycle stall when a branch is taken:
Suppose the instruction stream looks like:
instruction (not branch)
instruction (not branch)
instruction (not branch)
instruction A: taken branch
instruction B
instruction C
instruction D
...
instruction X: branch target
The PC is set at the end of IF to either PC+4 (normally) or if the Zero? field of EX/MEM is not 0 it is set to the ALU result
The Zero? field of EX/MEM stays 0 until the branch instruction is executed.
If the branch instruction is fetched in cycle i: The timing diagram looks like this:
instructioncycle i  cycle i+1  cycle i+2  cycle i+3  cycle i+4  cycle i+5  cycle i+6  cycle i+7  cycle i+8
instruction A (taken branch)IF  ID  EX  MEMWB
instruction BIFIDEXMEMWB
instruction CIFIDEXMEMWB
instruction DIFIDEXMEMWB
instruction X (branch destination)IFIDEXMEMWB

Reducing the branch penalty

Figure C.28 shows how to reduce the branch taken penalty from 3 to 1. Figures C.22 and C.28 compared The timing diagram now looks like this:
instructioncycle i  cycle i+1  cycle i+2  cycle i+3  cycle i+4  cycle i+5  cycle i+6
instruction A (taken branch)IF  ID  EX  MEMWB
instruction BIFIDEXMEMWB
instruction X (branch destination)IFIDEXMEMWB

Questions:
  1. Why do we not strike out the ID and EX of instruction B?
    Answer:
    We do not have to since they do not change the external state.
  2. Why don't with strike out the MEM and WB for instruction A?
    Answer:
    A branch instruction does not do anything in these stages.

Today's News: February 19, 2013
Exam on Thursday.

Examples:

Dealing with data hazards

Recall that there are 3 types of hazards: structural, data, and control.
Structural hazards will not occur because we included enough hardware.
The above discussion showed how to handle control hazards.
When a data hazard occurs, we need to either stall the pipeline, or elimintate the hazard by using forwarding.

Examples:
  1. The following requires a stall of the DADD instruction:
         LD    R1, 45(R2)
         DADD  R5, R1, R7
    
    • This can be detected in the ID stage of the DADD instruction by comparing rt of the LD instruction to rs and rt of the DADD instruction.
    • During the ID stage of DADD, rs is in IF/ID.IR.rs and rt is in IF/ID.IR.rt
    • During the ID stage of DADD, rt of LD is in ID/EX.IR.rt
  2. The following data hazard in the DSUB instruction can be removed by forwarding:
         LD    R1, 45(R2)
         DADD  R5, R6, R7
         DSUB  R8, R1 R7
    

Next Notes

Back to CS 3853 Notes Table of Contents