CS 3853 Architecture Notes on Appendix B Section 3

Read Appendix B.3

B.3: Cache Optimization

Summary: 6 optimizations in 3 categories:

reducing the miss rate
- larger block size
- larger cache size
- higher associativity
reducing the miss penalty
- multilevel caches
- giving priority to read misses over writes
reducing the hit time
- avoiding address translation during indexing of the cache
  Need to talk about virtual memory first.

Types of cache misses

compulsory: first access causes a miss, also called cold start misses or first reference misses.
capacity: cache cannot contain all of the blocks needed (blocks discarded that are later needed)
conflict: too many blocks map to the same set, also called collision misses.
These are misses that occur because the cache does not have full associativity.

Optimizations

ClassQue: Increasing Block Size

larger block size:
- increasing block size can decrease the miss rate up to a point
- if the block size is too large, the miss rate can increase due to not enough blocks.
- increasing the block size increases the miss penalty.
larger cache size
- can increase the hit time (if associative)
- can be expensive in cost and power
- limited capacity on chip

ClassQue: Increasing Associativity

higher associativity:
- reduces conflict misses
- requires extra hardware and can increase hit time
- 8-way is usually enough
multilevel caches to reduce miss penalty
- Widely used
- Use small first-level cache (L1) to match the clock cycle
- Use large second (and third) level cache to reduce miss penalty.
- Example: Intel Core i7 has:
  - a 32KB L1 instrution cache per processor
  - a 32KB L1 data cache per processor
  - a 256KB L2 cache per processor
  - a shared 8MB L3 cache
  Why not just use a larger (256K) L1 data cache?
ClassQue: MultiLevel Caches 1
Give priority to read misses over writes to reduce miss penalty
- with a write-through cache need a large write buffer
- on a read miss, must wait for write buffer to empty so you get the updated value
- if the read (miss) does not require data in the write buffer, can give it priority
- can do something similar with write-back
Avoid address translation during indexing
- We will come back to this after we discuss virtual memory