previous
 next 
CS 3853 Computer Architecture Notes on Chapter 3 Section 1


Read Section 3.1

3.1: Instruction Level Parallelism Concepts

ILP: the potential to overlap instruction execution.
In this chapter we will look at several ways to improve performance, including out of order execution.
When can we safely change the execution order of instructions?

Terminology

data dependence
instruction j is data dependent on instruction i:
  • Instruction j should be executed after instruction i
  • Instruction i produces a result that may be needed by instruction j or
  • There exists an instruction k such that instruction j is dependent on instruction k and instruction k is dependent on instruction i
    (transitive closure)
Example 1:
1) DADD R1, R2, R3
2) DUSB R4, R5, R1
3) AND  R6, R7, R4
  • instruction 2 depends on instruction 1
  • instruction 3 depends on instruction 2
  • instruction 3 depends on instruction 1
Example 2:
1) S.D  R1, 8(R2)
2) L.D  R4, 16(R3)
Is instruction 2 data dependent on instruction 1?
Dependencies that flow through memory locations are difficult to detect.
data dependence 1

name dependence 1: antidependence
instruction j is antidependent on instruction i:
  • Instruction j should be executed after instruction i
  • Instruction j writes to a register or memory location that instruction i reads
  • Note that data dependence can be stated as:
    Instruction i writes to a register or memory location that instruction j reads
Example 3:
1) DADD R1, R2, R3
2) DADD R2, R4, R5
  • instruction 2 is antidependent on instruction 1
  • We must make sure that instruction 1 reads from R2 before instruction 2 changes the value of R2.

name dependence 2: output dependence
output dependence between instruction i and instruction j:
Instructions i and j write to the same register or memory location
Example 4:
1) DADD R1, R2, R3
2) DADD R5, R1, R4
3) DADD R1, R6, R7
  • there is an output dependence between instructions 1 and 3

name dependence 3: comments

control dependence
an instruction is control dependent on a collection of branches if execution of that instruction depends on these branches.

basic block
a straight-line code sequence with no branches in except to its entry and no branches out except at its exit.
  • For MIPS, average dynamic branch frequency is 15% to 25%
  • Typical basic block between 3 and 6 instructions.
  • Not sufficient to just overlap among instructions in a basic block.
Example 5:
for (i=0; i<=999; i++)
   x[i] = x[i] + y[i];
Basic Block 1

Estimate the size of the basic block for the above code.
Each of the iterations is independent once the index is known.
Can write this as:
   x[0] = x[0] + y[0];
   x[1] = x[1] + y[1];
   x[2] = x[2] + y[2];
        ...

Today's News: October 19, 2015
Assignment 2 due on Friday

Dependencies and Hazards



Data Hazards

  • When there is a data dependence, we need to preserve the program order.
  • Goal: preserve program order only when it affects the outcome of the program.
  • Types of data hazards:
    • RAW (read after write)
      Most common: caused by a data dependence
    • WAW (write after write)
      caused by an output dependence
    • WAR (write after read)
      comes from an antidependence

Hazards and Dependencies 1

Control Dependence

Example 6:
if (p1)
    S1;
if (p2)
    S2;
S1 is control dependent on p1.
S2 is control dependent on p2, but not on p1 (unless S1 affects p2).

Constraints imposed by control dependence:
  • An instruction that is control dependent on a branch cannot be moved before the branch so that it is no longer controlled by the branch.
  • An instruction that is not control dependent on a branch cannot be moved so that it is controlled by the branch.
These requirements are more strict than necessary. What we really want is to preserve exception behavior and data flow.
Example 7:
1)      DADDU  R2, R3, R4
2)      BEQZ   R4, skip
3)      LW     R1, 0(R2)
4) skip:
Instruction 3 is not data dependent on instruction 2, but it cannot be moved above because it would change the exception behavior.
Example 8:
1)      DADDU  R1,  R2, R3
2)      BEQZ   R12, skip
3)      DSUBU  R4,  R5, R6
4)      DADDU  R5,  R4, R9
5) skip:
6)      OR     R7,  R8, R9
Notice that R4 is used as a temporary variable.
Suppose that R4 is not used after the skip. (We say R4 is dead after the skip)
We can move instruction 3 above the branch (or use it in the delay slot)

Exploiting ILP

  • Compiler-based static methods: Sections 3.2 and 3.3
  • Hardware-based dynamic approaches: Rest of Chapter 3
    Currently used for server and desktop devices (also some laptops)
    Not currently used for most PMD processors
  • Want to maximize CPI:
    Pipeline CPI is the sum of
    • ideal pipeline CPI
    • structural stalls
    • data hazard stalls
    • control stalls

Next Notes

Back to CS 3853 Notes Table of Contents