Read Appendix B.3
B.3: Cache Optimization
Summary: 6 optimizations in 3 categories:
- reducing the miss rate
- larger block size
- larger cache size
- higher associativity
- reducing the miss penalty
- multilevel caches
- giving priority to read misses over writes
- reducing the hit time
- avoiding address translation during indexing of the cache
Need to talk about virtual memory first.
Types of cache misses
- compulsory: first access causes a miss, also called cold start misses or first reference misses.
- capacity: cache cannot contain all of the blocks needed (blocks discarded that are later needed)
- conflict: too many blocks map to the same set, also called collision misses.
These are misses that occur because the cache does not have full associativity.
Optimizations
ClassQue: Increasing Block Size
- larger block size:
- increasing block size can decrease the miss rate up to a point
- if the block size is too large, the miss rate can increase due to not enough blocks.
- increasing the block size increases the miss penalty.
- larger cache size
- can increase the hit time (if associative)
- can be expensive in cost and power
- limited capacity on chip
ClassQue: Increasing Associativity
- higher associativity:
- reduces conflict misses
- requires extra hardware and can increase hit time
- 8-way is usually enough
- multilevel caches to reduce miss penalty
- Widely used
- Use small first-level cache (L1) to match the clock cycle
- Use large second (and third) level cache to reduce miss penalty.
- Example: Intel Core i7 has:
- a 32KB L1 instrution cache per processor
- a 32KB L1 data cache per processor
- a 256KB L2 cache per processor
- a shared 8MB L3 cache
Why not just use a larger (256K) L1 data cache?
ClassQue: MultiLevel Caches 1
- Give priority to read misses over writes to reduce miss penalty
- with a write-through cache need a large write buffer
- on a read miss, must wait for write buffer to empty so you get the updated value
- if the read (miss) does not require data in the write buffer, can give it priority
- can do something similar with write-back
- Avoid address translation during indexing
- We will come back to this after we discuss virtual memory