2.1 The Mapping Problem
Cache holds a few thousand lines (blocks); main memory holds millions. Mapping answers: where may a given memory block live in the cache, and how do we recognise it there? Every scheme splits the physical address into fields — mastering that split solves 90% of cache numericals.
Running example: 64 KB byte-addressable main memory (16-bit addresses), 2 KB cache, 16-byte blocks. Cache lines = 2048 / 16 = 128 lines. Offset bits = log₂16 = 4.
2.2 Direct Mapping
Each memory block can live in exactly one line: line = (block number) mod 128.
Address (16 bits) = | TAG 5 | INDEX (line) 7 | OFFSET 4 |
index bits = log2(128) = 7, tag = 16 - 7 - 4 = 5
Address 0x2A47 = 0010 1010 0100 0111
TAG = 00101 (0x05)
INDEX = 010 0100 (line 36)
OFFSET = 0111 (byte 7 within the block)
Lookup: go straight to line 36, compare its stored tag with 00101; equal → hit, else → miss and refill. Why it's cheap: one comparator, no search. Why it hurts: two hot blocks with the same index (e.g., addresses 0x2A47 and 0x0A47) evict each other forever — conflict misses even when the cache is mostly empty.
2.3 Fully Associative Mapping
A block may live in any line: address = | TAG 12 | OFFSET 4 |. Lookup compares the tag against all 128 lines simultaneously, needing content-addressable memory (one comparator per line). Zero conflict misses, maximal hardware cost — practical only for tiny structures (e.g., TLBs).
2.4 Set-Associative Mapping (the Compromise)
Group lines into sets; a block maps to exactly one set but may occupy any of the k lines within it (k-way). For 2-way: sets = 128 / 2 = 64 → set bits = 6.
Address = | TAG 6 | SET 6 | OFFSET 4 |
0x2A47 = 0010 1010 0100 0111
TAG = 001010 (10), SET = 100100 (36), OFFSET = 0111
Only k comparators per lookup, and two conflicting blocks can now coexist in one set. Real CPUs use 4–16 way L1/L2 caches.
| Criterion | Direct | k-way Set-Assoc | Fully Assoc |
|---|---|---|---|
| Placement freedom | 1 line | k lines | anywhere |
| Comparators | 1 | k | one per line |
| Conflict misses | worst | low | none |
| Cost/speed | cheapest, fastest | balanced | costliest |
2.5 Replacement and Write Policies
On a miss in a full set, evict which line? LRU (least recently used — best locality match, needs usage bits), FIFO (oldest in — simpler, ignores reuse), Random (surprisingly decent, trivial hardware). Example: 2-way set, accesses A, B, A, C → LRU evicts B (A was touched more recently); FIFO evicts A (loaded first).
| Policy | Write hit action | Write miss (typical pairing) | Trade-off |
|---|---|---|---|
| Write-through | write cache and memory | no-write-allocate | memory always consistent; heavy bus traffic (add a write buffer) |
| Write-back | write cache only, set dirty bit; write memory on eviction | write-allocate | minimal traffic; memory temporarily stale — DMA/multicore must beware |
2.6 Cache Performance: AMAT
AMAT = hit time + miss rate x miss penalty
Example: tc = 2 ns, h = 0.9, penalty = 50 ns
AMAT = 2 + 0.1 x 50 = 7 ns
Two-level: L1 2 ns, L1 miss 10%; L2 10 ns, L2 miss 20%; memory 100 ns
AMAT = 2 + 0.1 x (10 + 0.2 x 100) = 2 + 0.1 x 30 = 5 ns
Misses come in three flavours — compulsory (first touch), capacity (working set exceeds cache), conflict (mapping collisions; the kind associativity removes). Classifying a miss stream is a rising exam favourite.
🎯 Exam Focus
- A 4 KB cache with 32-byte lines serves a 1 MB byte-addressable memory. Give the tag/index/offset widths for direct mapping, and for 4-way set-associative mapping.
- For direct mapping in Section 2.2, which line and tag does address 0x1B6C use? Show the binary split.
- Compare direct, associative and set-associative mapping on placement, comparators and conflict misses.
- Differentiate write-through and write-back caches; what is a dirty bit and why does DMA complicate write-back?
- Cache hit time 4 ns, miss penalty 80 ns: what hit ratio keeps AMAT below 10 ns? (Solve for h.)
- Explain compulsory, capacity and conflict misses with one scenario each.