CS 3853 Architecture Notes on Appendix B Section 3

Read Appendix B.3

B.3: Cache Optimization

Summary: 6 optimizations in 3 categories:

reducing the miss rate
- larger block size
- larger cache size
- higher associativity
reducing the miss penalty
- multilevel caches
- giving priority to read misses over writes
reducing the hit time
- avoiding address translation during indexing of the cache
  Need to talk about virtual memory first.

Types of cache misses

compulsory: first access causes a miss, also called cold start misses or first reference misses.
capacity: cache cannot contain all of the blocks needed (blocks discarded that are later needed)
conflict: too many blocks map to the same set, also called collision misses.
These are misses that occur because the cache does not have full associativity.

Optimizations

Increasing Block Size

larger block size:
- increasing block size can decrease the miss rate up to a point
- if the block size is too large, the miss rate can increase due to not enough blocks.
- increasing the block size increases the miss penalty.
larger cache size
- can increase the hit time (if associative)
- can be expensive in cost and power
- limited capacity on chip

Increasing Associativity

higher associativity:
- reduces conflict misses
- requires extra hardware and can increase hit time
- 8-way is usually enough
multilevel caches to reduce miss penalty
- Widely used
- Use small first-level cache (L1) to match the clock cycle
- Use large second (and third) level cache to reduce miss penalty.
- Example: Intel Core i7 has:
  - a 32KB L1 instruction cache per processor
  - a 32KB L1 data cache per processor
  - a 256KB L2 cache per processor
  - a shared 8MB L3 cache
  Why not just use a larger (256K) L1 data cache?
MultiLevel Caches 1
Give priority to read misses over writes to reduce miss penalty
- with a write-through cache need a large write buffer
- on a read miss, must wait for write buffer to empty so you get the updated value
- if the read (miss) does not require data in the write buffer, can give it priority
- can do something similar with write-back
Avoid address translation during indexing
- We will come back to this after we discuss virtual memory