1.1 The Problem: Fast, Large, Cheap — Pick Two
Any single memory technology forces a trade-off: SRAM is fast but expensive and power-hungry; DRAM is dense but slow; disk is enormous but glacial. The memory hierarchy stacks them so the system appears to have the speed of the top level and the capacity of the bottom:
[ CPU Registers ] ~1 KB, <1 ns, managed by compiler
[ L1 / L2 / L3 Cache ] KB-MB, 1-30 ns, managed by hardware
[ Main Memory (DRAM) ] GBs, ~60-100 ns, managed by OS+HW
[ SSD / Magnetic Disk ] TBs, 0.1-10 ms, managed by OS
[ Backup / Tape / Cloud ] PBs, seconds+, managed by admin
| Level | Typical Size | Access Time | Cost/GB | Managed By |
|---|---|---|---|---|
| Registers | ~1 KB | 0.3 ns | — | compiler |
| L1 cache (SRAM) | 32–64 KB | ~1 ns | very high | hardware |
| L2/L3 cache | 256 KB–64 MB | 3–30 ns | high | hardware |
| Main memory (DRAM) | 8–128 GB | 60–100 ns | moderate | OS + hardware |
| SSD | 0.5–8 TB | ~100 µs | low | OS |
| HDD | 1–20 TB | 5–10 ms | very low | OS |
Two invariants: going down the pyramid, capacity and access time grow while cost per bit falls; and each level typically holds a superset of the level above (inclusion).
1.2 Why It Works: Locality of Reference
Programs do not touch memory randomly — a fact so reliable it is called the principle of locality:
- Temporal locality: a location referenced now will likely be referenced again soon (loop counters, hot functions, stack top).
- Spatial locality: locations near a referenced one will likely be referenced soon (sequential instructions, array traversal).
for (i = 0; i < n; i++)
sum = sum + a[i];
// sum, i : temporal locality (reused every iteration)
// a[i] : spatial locality (consecutive addresses)
// loop code itself: both
Locality is why a cache holding only ~0.01% of memory can satisfy >95% of references — copy the currently hot blocks upward and most accesses hit the fast level. Design consequence: memory is moved between levels in blocks (cache lines, pages), not single words, precisely to exploit spatial locality.
1.3 Quantifying the Win: Hit Ratio and Average Access Time
Let h = hit ratio at the fast level, t1 = fast access time, t2 = slow access time. If the slow level is accessed only on a miss (after the fast lookup fails):
t_avg = h.t1 + (1 - h)(t1 + t2)
Example: h = 0.95, t1 = 5 ns, t2 = 100 ns
t_avg = 5 + 0.05 x 100 = 10 ns
A memory 20× slower than cache ends up only 2× slower on average — the entire justification of the hierarchy in one line. Note the exam nuance: some questions define t_avg = h·t1 + (1−h)·t2 (parallel lookup); state your assumption.
1.4 The Technologies
| Property | SRAM | DRAM |
|---|---|---|
| Cell | 6 transistors (flip-flop) | 1 transistor + 1 capacitor |
| Refresh | not needed | needed every few ms (charge leaks) |
| Speed | very fast | slower (row/column addressing) |
| Density / cost | low density, expensive | high density, cheap |
| Use | caches | main memory |
ROM family (non-volatile): ROM (factory-programmed) → PROM (one-time fuse) → EPROM (UV-erasable, whole chip) → EEPROM (electrically erasable, byte-wise) → Flash (block-erasable EEPROM; SSDs, firmware). The refresh requirement of DRAM ("why must DRAM be refreshed?") and the SRAM/DRAM table are perennial short questions.
🎯 Exam Focus
- Draw the memory hierarchy pyramid and annotate each level with typical size, speed and manager.
- Explain temporal and spatial locality with a code fragment, and connect each to a hardware design decision (cache lines, LRU).
- A system has cache access time 8 ns, memory access time 120 ns, hit ratio 0.92 (miss = cache then memory). Compute the average access time.
- Why must DRAM be refreshed while SRAM need not be? Compare the two technologies.
- Distinguish PROM, EPROM, EEPROM and Flash memory.
- State the inclusion property of the hierarchy and why block (not word) transfer is used between levels.