previous
 next 
CS 3853 Computer Architecture Notes on Appendix B Section 1

Read Appendix B.1
Examples:

B.1: Introduction

Example:
Suppose a cache can be accessed in one cycle, the hit ratio is 97%, and the miss penalty is 30 cycles.
What is the average number of cycles needed for an instruction fetch?
Solution:
Note that the miss penalty is the number of extra cycles required on a miss.
.97 × 1 + .03 × 31 = .97 + .93 = 1.90 cycles or:
1 + .03 × 30 = 1 + .9 = 1.9 cycles.

Cache Terminology

Memory Hierarchy

Level1234
NameRegistersCacheMain MemoryDisk
Typical Size< 1KB32KB - 8MB< 512GB> 1 TB
Technologycustom, multiple portson-chip CMOS SRAMCMOS DRAMMagnetic disk
Access time (ns)0.15-0.300.5 - 1530-2005,000,000
Bandwidth (MB/sec)100,000 - 1,000,00010,000 - 40,0005000 - 20,00050-500
Managed byCompilerHardwareOSOS, user
Typical values from 2006 (Figure B.1)
Example:
Suppose it takes 30 ns to access one byte. What is the bandwidth in MB/sec?
Solution:
30 ns to access a byte means 1 byte every 30 ns = 1 byte every 3 × 10-8 seconds for a bandwidth of 1/(3 × 10-8) bytes per second = 3.3 × 107 bytes/sec. = 33 MB/sec.


Today's News: February 23, 2015
Exam 1 returned today

If main memory has an access time of 30 ns, how can the bandwidth be 5000 MB/sec?
Memory Accesses per Instruction

Example: (Similar to the example on page B-5 of the text)
Suppose we have the standard MIPS 5-stage pipeline which has a CPI of 1 when all memory accesses are cache hits.
Loads and stores are 25% of all instructions, the miss rate is 4% and the miss penalty is 20 cycles.
How much faster would the computer be with no cache misses?
Incorrect Solution:
The CPI with cache misses is 1 + .25 × .04 × 20 = 1 + .20 = 1.20, so it would be 1.20 times as fast or 20% faster.
Correct Solution:
CPI with cache misses is 1 + 1.25 × .04 × 20 = 1 + 1.0 = 2.0, so it would be 2 times as fast or 100% faster.


Today's News: February 25, 2015
Recitation this week goes over the exam

A direct mapped cache example


Direct Mapped Cache Location

The above is an example of a direct mapped cache.
Given an address, you can tell exactly where in the cache it would be stored.
Why not just put a cache block anywhere in the cache?
It might take too long to do the lookup.

Four Questions on Cache Design

  1. block placement: Where can a block be stored in the cache?
  2. block identification: How is a block located in the cache?
  3. block replacement: What block should be replaced on a miss?
  4. write strategy: What happens on a write?
We will take these one at a time.
Today's News: February 27, 2015
Recitation this week goes over the exam

Block Placement

Cache Block Placement 1

Block identification

There are 2 parts to this:
Find out where it might be
Check all of these possible locations

Cache Block Identification 1

Block replacement

This is not an issue for direct mapped caches, since there is no choice.
The standard methods are:

Write strategy

Two basic write policies - these describe what happens on a write hit Write miss options
Today's News: March 2
Assignment 2 is available, due March 27

Example:
Suppose addresses are 40 bits, the cache is 64K and uses 64-byte blocks with 2-way set associative placement.
What is the format of a memory address? That is, which bits are used for the tag, index, block address, and offset?
Also, what is the maximum number of gigabytes this processor can access?
Solution:
The address is 40 bits. 64 = 26 so the block offset is 6 bits.
The index indicates which set to use. There are 64K/64 = 1K blocks with 2 blocks per set, or 512 = 29 sets. The index is 9 bits.
The remainder: 40 - 9 - 6 = 25 bits are for the tag.
The block address is the tag and the index.
The maximum number of gigabytes that can be accessed is 240/230 = 1K gigabytes, (or twice this).
Appendix B.5

The Opteron data cache

The AMD Opteron uses 40-bit addresses with a 64K data cache.
The cache has 64-byte blocks with 2-way set associative placement.
It uses LRU replacement, write-back and write-allocate on a write miss.
It is shown in
Figure B.05
The numbers show the steps that occur on a cache hit for a 64-bit memory access.
  1. The (physical) address is determined. (Details in Section B.4.)
  2. The index selects the two cache entries of the corresponding set.
  3. The tag in each cache block is compared to the tag of the memory address (if the valid bit is set).
  4. The upper 3 bits of the block offset are used to specify an 8-byte portion of the matched block.
    For a hit:
    • If this is a read, the data is transfered from the cache.
    • If this is a write, the data is written to the cache. Since write-back is used, the memory does not need to be accessed during a hit.
    On a miss, one of the two blocks is chosen for replacement.
    Since this uses write-back, if its dirty bit is set, the entire block is written back to memory through the victim buffer.
    • On a read, the entire block is read and stored in the cache while an 8-byte part is returned to the CPU.
    • On a write, the entire block is read and stored in the cache (write-allocate) and then the 8-byte value is written to the cache and the dirty bit is set.
Figure B.5 Questions 1

Next Notes

Back to CS 3853 Notes Table of Contents