previous
 next 
CS 3853 Computer Architecture Notes on Appendix C Section 1



C1: Introduction

Consider a traditional processor in which instructions are executed as follows:
  1. Instruction fetch (IF): read the instruction from memory and update the PC
  2. Instruction decode (ID): decode the instruction and read the source registers
  3. Execution (EX): execute, e.g. perform ALU operation (may be effective address calculation)
  4. Memory access (MEM): If this was a load, read from memory, if a store, write to memory
  5. Write-back (WB): Write the result to the destination register
If we execute one instruction per cycle, the cycle time needs to be long enough to perform all of these steps on the longest instruction.
Alternatively, we can execute each part in a cycle, which makes the cycle time shorter, but some instructions will require as many as 5 cycles.
The cycle time needs to be long enough for the slowest of these steps.
A 5-cycle instruction will execute slower, but some instructions will take fewer cycles.

Today's News: August 31, 2015
Recitation and quiz today.


Pipeline Timing 1

We will consider this second approach for now.
What fraction of the time is the ALU being used?
How can we improve the performance?
The idea of pipelining: Fetch the next instruction while decoding the previous instruction.
Instead of a throughput of 1 instruction every 5 cycles, we could get one per cycle after an initial delay.

Important:
  • You must know the above 5 steps
  • You must be able to give them in order using the 2 or 3 letter description:
    IF, ID, EX, MEM, WB
  • You must know the names of each step:
    Instruction fetch, Instruction decode, Execution, Memory access, Write-back
  • You must be able to describe in general what each step does
  • For each of the following major types of 4-byte RISC instructions,
    you must be able to describe in detail what happens at each step:
    • register-register ALU (result in another register)
    • register-immediate ALU (result in another register)
    • load (register with displacement addressing)
    • store (register with displacement addressing)
    • conditional branch instruction that compares 2 registers
    See pages C5 and C6 of the text.

The MIPS instruction set


Example
Describe the execution of
DADDIU R1, R2, #3
at each of the 5 execution stages.
Solution:
  1. IF:
    • Send the PC to the instruction memory and fetch the next instruction.
    • Add 4 to the PC (length of the instruction)
  2. ID:
    • decode the instruction
    • get R2 from the register file
    • sign-extend the immediate value from the instruction
  3. EX:
    • send the value of R2 and the sign-extended immediate value to the ALU to perform the add
  4. MEM:
    • nothing to do here
  5. WB:
    • write the result to R1 in the register file

Problems
5-stage pipeline 1
  1. Describe the execution of
    LD R1, 30(R2)
    in the EX stage.
  2. In the ID step, we decode the instruction and read the source registers.
    Describe in words what is meant by decode the instruction.


The classic 5-stage pipeline

The simple 5-stage pipeline looks like this:
clock number
Instruction number    123456789
Instruction iIFIDEXMEMWB
Instruction i+1IFIDEXMEMWB
Instruction i+2IFIDEXMEMWB
Instruction i+3IFIDEXMEMWB
Instruction i+4IFIDEXMEMWB

Today's News: September 2, 2015
No news


Starting with clock number 5, one instruction completes per clock cycle.
Figure C.2 shows the hardware needed to support each stage of the pipeline.
We need to make sure that a piece of hardware does not need to do 2 things at once.
For example, an ADD will need to use the ALU in stage 3 (the EX stage) and a LD will need to compute an effective address (by adding a displacement to a register) in the same stage.
However, a branch will need to computer the branch address in stage 2 (ID) which requires an adder.
We also need an adder in stage 1 (IM) to increment the PC. Note that the register file is accessed in both ID and WB.
We assume that we can read and write in the same cycle.
In fact, we write at the beginning of the cycle and read at the end of the cycle.

Implementation requirement: pipeline registers

At each stage, certain values need to be saved so they do not change during the next stage.
Example: ALU is combinational logic Figure C.3 shows the pipeline registers required.

Questions:
Figure C.3 ALU and Pipeline
  1. In Figure C.3, how many different ALU's are shown?
    Answer:
  2. In Figure C.3, how many different pipeline registers are shown?
    Answer:

Today's News: September 4, 2015
No recitation next week since Monday is Labor Day


Example
Describe what needs to be stored in each of the pipeline registers during the execution of
DADDIU R1, R2, #3
Solution:
It is easier to do this backwards, starting with the MEM/WB to make sure everything that is needed propagates.
Only values from the previous pipeline register and those computed at the current stage are available to be saved.
Look at the previous example describing what is needed at each stage.
  1. IF/ID: The fetched instruction
  2. ID/EX:
    • the value of R2 (from the register file)
    • the sign-extended immediate value (from the IF/ID register, with the help of some hardware)
    • the address of the R1 register (from the IF/ID register)
    • the following decoded control lines: ALU, MEM, WB (generated by hardware from the IF/ID register)
  3. EX/MEM:
    • the result of the ALU operation
    • and the address of the R1 register (from ID/EX)
    • the following control lines (from ID/EX): MEM, WB
  4. MEM/WB:
    • the result of the ALU operation (from EX/MEM)
    • the address of the R1 register (from EX/MEM)
    • the control lines for WB (from EX/MEM)

A Note on terminology
When discussing registers, you should be careful to distinguish between
  • the contents of the register
  • the address of the register
If the register R1 contains the value 234 then
  • The value of R1 is 234
  • The address of R1 is 1
If you just say R1, it may be ambiguous what you mean.
Always either say "the value of" or "the address of" when referring to a register.
In the example above, the ID/EX pipeline register stores
  • the value of R2
  • the address of R1 (the number 1)

Pipeline performance

Examples
  1. A unpipelined machine has a 1 ns clock.
    All instructions take 5 cycles, except for branches which take 2 cycles. Branches are 30% of all instructions.
    What is the speedup obtained by using a pipelined design if the pipelining increases the clock cycle time to 1.5 ns?
    Solution:
    Unpipelined CPI = .7 × 5 + .3 × 2 = 4.1
    Unpipelined average instruction execution time: 1 ns × 4.1 = 4.1 ns.
    Pipelined average instruction execution time: 1.5 ns.
    Speedup  = 
    Average instruction execution time unpipelined
    Average instruction execution time pipelined
     = 
    4.1
    1.5
     =  2.733.
  2. Why did we ignore the latency of the pipelined machine in the above solution?

Next Notes

Back to CS 3853 Notes Table of Contents