previous
 next 
CS 3853 Computer Architecture Notes on Chapter 1

Read Chapter 1

Introduction

What do the following mean: You need to answer these in terms of execution time for a given task.
X is n times faster than Y:  
Execution timeY
Execution timeX
 =  n


X is n% faster than Y:  
Execution timeY
Execution timeX
= 1 +
n
100
We will never use the word slower in this context, only faster.

Section 1.1: Traditional Performance Growth


Section 1.2: Classes of Computers

Types of computers:

Application Parallelism

Hardware Parallelism

Hardware Classifications


Section 1.3: Defining Computer Architecture

In CS 3843 we concentrated on the ISA.
Computer Architecture has 3 main components:
Instruction Set Architecture (ISA)
Organization (also called microarchitecture): memory systems, bus structure, internal CPU design
Hardware: detailed logic design, packaging technology
Design the organization and hardware to meet goals
Example:
The Intel 8088 (1979) and 8086 (1978) had the same ISA.
The 8088 had an 8-bit memory bus while the 8086 had a 16-bit bus.
The 8088 was very successful because it was cheap build computers with an 8-bit bus (in 1979)

Instruction Set Architecture

This is what we used to define the architecture of the X86 and Y86 in CS 3843.

Class of ISA

usually general purpose register architectures with operands registers or memory.
classified as register-memory or load-store.

Memory Addressing

usually byte addressing

Addressing Modes

covered extensively in CS 3843

Types and sizes of operands

often: integers of 8, 16, 32, and 64 bits and floating point of 32 and 64 bits.

Operations

instructions: e.g. add, shift, and, branch

Control flow instructions

a subset of above, including conditional branches, call and return

ISA encoding

fixed length or variable length.
In CS 3843 you should have seen this for Y86 in detail and X86 examples.

Authors' View of Computer Architecture

ISA, organization, hardware

Organization

Also called microarchitecture
High level aspects of computer's design, e.g. memory system, memory interconnect, CPU design
Example: Intel and AMD use the same ISA, but different organizations.

Hardware

refers to detailed logic an packaging technology
Example: Intel Core i7 and Intel Xeon 7560

Some aspects of computer architectural design


Today's News: September 3
There are no recitations scheduled for the second week of class.
Recitations will start on September 10.

Section 1.4: Trends in Technology

Integrated Circuit Technology

Semiconductor Ram

Semiconductor Flash

Magnetic Disk

Network Technology

Trends in Bandwidth and Latency

Wires


Section 1.5: Trends in Power and Energy

Issues

Energy vs. Power

Microprocessor Energy and Power

Techniques for improving energy efficiency

ClassQue: Power and Energy

Section 1.6: Trends in Cost

Cost of an Integrated Circuit


Section 1.7: Dependability

Example 1:
A system has 2 disks, each with a MTTF of 1,000,000 hours and a power supply with a MTTF of 200,000 hours. What is the system MTTF?
Solution:
failure rate = 2 × 1/1,000,000 + 1/200,000 = 7/1,000,000
MTTF = 1,000,000/7 = 143,000 hours.
Example 2:
A disk drive has a MTTF of 1,000,000 hours. What is the probability that it will last 50 years without failure?
Solution:
Hours per year = 24*365.25 = 8766.
Probability of dying in a given year: 8766/1,000,000 = .008766
Probability of not dying in a given year: 1 - .008766 = .991234
Probability of not dying in 50 years (.9912234)50 = .6439
What is wrong with this solution?
ClassQue: MTTF 1

Section 1.8: Measuring, Reporting, and Summarizing Performance

A computer user is interested in response time
An operator of a warehouse-scale computer is interested in throughput

Comparing two computers

Examples (Also see ClassQue Questions)
  1. Machine A is 40% faster than B and B is 40% faster than C. How much faster is A than C?
    Solution:  
    EB
    EA
    = 1.4 and
    EC
    EB
    = 1.4 so
    EC
    EA
    = 1.4 × 1.4 = 1.96
  2. Machine performance increases by 40% per year for 10 years. By what percentage does performance increase over this period?
    Solution:   1.410 = 28.93. This is 1 + n/100 for n = 2793.

Today's News: September 5
There are no recitations scheduled for the second week of class.
Recitations will start on September 10.

Time

ClassQue: About Time

Benchmarks


Types of benchmarks:
Tricks people play
SPEC: Standard Performance Evaluation Corporation
Summarizing Performance Results
ClassQue: geometric mean

Section 1.9: Quantitative Principles of Computer Design

Take advantage of parallelism

Principle of locality

Focus on the common case

Amdahl's law

A given enhancement will only improve a part of a program.
Example: on multicore machine, some of the code will only user one core.

Speedup
Speedup =
Performance for entire task using the enhancement when possible
Performance for the entire task without using the enhancement

or
Speedup =
Execution time for entire task without using the enhancement
Execution time for the entire task using the enhancement when possible

Amdahl's law depends on: Fractionenhanced and Speedupenhanced
Note: Fractionenhanced is relative to the original design.
Fractionunenhanced = 1 - Fractionenhanced
Execution timeold = Execution timeold-unenhanced + Execution timeold-enhanced
Execution timeold = Execution timeold × Fractionunenhanced + Execution timeold × Fractionenhanced
Execution timenew = Execution timeold × Fractionunenhanced +
Execution timeold
Speedupenhanced
× Fractionenhanced
Execution timenew = Execution timeold × (Fractionunenhanced +
Fractionenhanced
Speedupenhanced
)
Speedupoverall =
Execution timeold
Execution timenew


Amdahl's Law:
Speedupoverall =
 
1

Fractionunenhanced +
Fractionenhanced
Speedupenhanced
or
Speedupoverall =
 
1

1 - Fractionenhanced +
Fractionenhanced
Speedupenhanced

You must be able to use this formula, but you also need to understand when the formula cannot be used directly.
You can only use the formula directly when you know both the enhanced fraction and the enhanced speedup.
Important:
To use the formula as given, you must know the enhanced fraction
which is the fraction of the time spent on the enhanced part
when run on the old system.

ClassQue: speedup 1

Examples
  1. A new design makes the floating point processor of the CPU 80% faster than before. What is the overall speedup for a task in which floating point operations took up 30% of the CPU time with the old design?
    Solution:
    Method 1: use the formula
    The enhanced fraction is 30% = .3. The unenhanced fraction is .7. The enhanced speedup is 1.8. The overall speeup is
    1
    .7 + .3/1.8
      =   1.1538.
    Method 2: use the definition directly
    Execution timeold = .7 × Execution timeold + .3 × Execution timeold
    Execution timenew = .7 × Execution timeold + .3 × Execution timeold/1.8
    Execution timenew = (.7 + .3/1.8) × Execution timeold = .86667 × Execution timeold
    Speedup = 1/.86667 = 1.15385
  2. A new design makes the floating point processor of the CPU by 80% faster than before. What is the overall speedup for a task in which floating point operations took up 10% of the CPU time with the new design?
    Solution: We cannot use the formula directly since the enhanced fraction in the formula is based on the old design.
    Method 1: first calculate Fractionenhanced (relative to the old design).
    Suppose the total execution time on the new systems is E.
    The unenhanced time on the new system is .9E and the enhanced time is .1E.
    The unenhanced time on the old system is still .9E and the enhanced time is .1E × 1.8 = .18E.
    The fraction enhanced (relative to the old system) is
    .18E
    (.9E + .18E)
    = .1667.
    The speedup is
    1
    1-.1667 + .1667/1.8
    = 1.08.
    Method 2: Use the definition of Speedup directly
    Relative to the new design, the enhanced fraction is .1 and the unenhanced fraction is .9.
    Execution timenew = .9 × Execution timenew + .1 × Execution timenew
    Execution timeold = .9 × Execution timenew + .1 × Execution timenew × 1.8
    Speedupoverall =
    Execution timeold
    Execution timenew
    =
    .9 × Execution timenew + .1 × Execution timenew × 1.8
    Execution timenew
    = .9 + .18 = 1.08.

Processor Performance Equations

Dependencies

More on CPI

Examples
  1. In a particular task, 23% of the instructions are floating point instruction which each take 5 cycles to execute. All other instructions take 1 cycle to execute. What is the average CPI for this task?
    Solution: CPI = .23 × 5 + .77 × 1 = 1.92
  2. A new design can reduce the number of cycles for a floating point operation to 4, without changing the clock speed. What is the new CPI and what is the expected speedup?
    Solution: CPI = .23 × 4 + .77 × 1 = 1.69
    Speedup = 1.92/1.69 = 1.136
  3. What is wrong with the following method of solving the above problem using the Amdahl's Law equation?
    Fractionenhanced = .23, Fractionunenhanced = .77, and Speedupenhanced = 5/4 = 1.25. Therefore
    Speedupoverall =
    1
    .77 + .23/1.25
    = 1.0482
    Answer: .23 is the fraction of instructions that are enhanced, not the time spent executing the enhanceable part.
    The correct enhanced fraction is (.23 × 5)/(.23 × 5 + .77) = 1.15/1.92 = .5990. This gives
    Speedupoverall =
    1
    .4010 + .5990/1.25
    = 1.136

Section 1.10: Putting It All Together


Section 1.11: Fallacies and Pitfalls

Fallacy: a commonly held misbelief
Pitfall: an easily made mistake

Fallacies

Pitfalls


Next Notes

Back to CS 3853 Notes Table of Contents