Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Memory Organization: Cache, Virtual, Associative and Auxiliary Memory

Lesson 17 of 17 in the free Computer Organization and Architecture notes on Siksha Sarovar, written by Rohit Jangra.

Memory Hierarchy Pyramid

---

SRAM vs DRAM Comparison

FeatureSRAMDRAM
TechnologyBistable flip-flop (6 transistors per cell)Capacitor + transistor (1 transistor per cell)
SpeedVery fast (~1–5 ns)Slower (~50–100 ns)
PowerHigher (always drawing current)Lower (standby is low power)
Cost per bitHighLow
DensityLow (6T per cell = larger)High (1T per cell = denser)
Refresh neededNo (static)Yes — capacitors leak; refresh every ~64 ms
VolatileYesYes
UseCPU cache (L1, L2, L3)Main memory (RAM DIMMs)

---

Auxiliary Memory

TypeTechnologyTypical CapacityAccess TimeCost/GBBest Use
HDDMagnetic disk (spinning)1–20 TB5–10 msVery LowMass storage, backups
SSD (SATA)NAND Flash128 GB–4 TB50–100 µsLowOS, applications
SSD (NVMe)NAND Flash via PCIe256 GB–8 TB20–50 µsMediumHigh-perf storage
Optical (BD)Laser on disc25–100 GB80–120 msVery LowMedia distribution
Magnetic TapeMagnetic coating10–300 TB/cartridgeseconds–minutesExtremely LowLong-term archival

---

Associative Memory (Content-Addressable Memory, CAM)

Unlike regular RAM (indexed by address), CAM is indexed by content:

  • You provide a search key (data) and CAM returns the address of matching entry
  • All cells compared simultaneously in parallel (hardware comparators in every cell)
  • Very fast search O(1) regardless of table size
  • Applications: TLB (Translation Lookaside Buffer), routing tables, cache tag matching
FeatureRAMCAM
Access methodAddress → DataData → Address
Search timeO(1) by addressO(1) by content
Hardware costLow (passive cells)High (comparator per cell)
DensityHighLow (6–10× less dense than RAM)
ApplicationGeneral storageTLB, cache tags, network routers

---

Cache Mapping Techniques

1. Direct Mapping

Block i maps to cache line: line = i mod (number of cache lines)

AddressTagLine (Index)Block Offset
32-bitUpper bitsMiddle bits (log₂ lines)Lower bits (log₂ block_size)

Problem: Two heavily used blocks mapping to same line cause constant evictions (conflict misses).

2. Fully Associative

Any block can go in any cache line. Requires comparing tag against all lines simultaneously.

Advantage: No conflict misses. Disadvantage: Very expensive hardware (n comparators for n lines).

3. k-Way Set-Associative

Cache divided into S sets of k lines each. Block maps to set = (block number) mod S, but within the set, can go in any of k lines.

Compromise: k=2 or k=4 way gives most of the benefit of fully associative with modest hardware.

---

Cache Mapping Comparison

SchemeBlock PlacementTag CompareConflict MissesHardware Cost
Direct Mapped1 fixed line1 comparatorHighVery Low
2-Way Set-Assoc1 of 2 lines in a set2 comparatorsReducedLow
4-Way Set-Assoc1 of 4 lines in a set4 comparatorsLowerMedium
Fully AssociativeAny linen comparatorsMinimalVery High

---

Cache Performance Formula

Average Access Time (AAT) = h × Tc + (1−h) × Tm

Where: h = hit rate, Tc = cache access time, Tm = main memory access time

Worked Example:

  • Hit rate h = 0.90, Cache time Tc = 5 ns, Main memory Tm = 100 ns
  • AAT = 0.90 × 5 + 0.10 × 100 = 4.5 + 10 = 14.5 ns
  • Compare to no cache: 100 ns → 6.9× speedup!

---

Cache Replacement Policies

PolicyDescriptionImplementationPerformanceNotes
LRU (Least Recently Used)Evict block not used for longest timeAge counter per line; complexNear-optimalBest for temporal locality
FIFO (First In First Out)Evict oldest loaded blockSimple queueGoodCan suffer Belady's anomaly
LFU (Least Frequently Used)Evict block with lowest access countFrequency counterGood for repeated accessCan retain stale old data
RandomEvict a randomly selected blockSimple random selectorSurprisingly goodSimple hardware
Optimal (Belady)Evict block not used for longest future timeRequires future knowledgeBest possibleTheoretical benchmark only

---

Virtual Memory Address Translation

---

Page Replacement Algorithms (3 frames, ref string: 1,2,3,4,1,2,5,1,2,3,4,5)

AlgorithmPage FaultsDescriptionProsCons
FIFO9Evict page loaded firstSimple; no hardware neededBelady's anomaly; ignores usage
LRU8Evict page not used for longest timeNear-optimal; exploits localityComplex hardware (or software overhead)
Optimal (OPT)7Evict page not used for longest future timeMinimum possible faultsNot implementable (needs future knowledge)
Clock (2nd Chance)~8FIFO but give pages a second chance via reference bitGood approximation of LRUSlight overhead; large page tables
LFUVariableEvict least frequently used pageGood for stable hot setOld pages with high historical count persist

---

Belady's Anomaly

Normally, more page frames → fewer page faults. But FIFO can exhibit the opposite:

  • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
  • 3 frames (FIFO): 9 page faults
  • 4 frames (FIFO): 10 page faults ← more faults with more frames!

This is Belady's Anomaly — specific to FIFO. LRU and Optimal do NOT exhibit it (they are stack algorithms). Solution: use LRU or Clock algorithm instead of FIFO.

---

Study Deep: TLB Shootdowns in Multicore Systems

When an OS unmaps a virtual page (e.g., during munmap or context switch), it must invalidate TLB entries in ALL CPU cores that might have cached the mapping. This requires:

  1. CPU A sends an IPI (Inter-Processor Interrupt) to all other CPUs
  2. Each CPU receives the IPI and executes INVLPG (invalidate TLB page entry)
  3. CPU A waits for all CPUs to acknowledge the TLB flush
  4. CPU A resumes

This TLB shootdown is a significant performance bottleneck in:

  • High-performance computing (many processes sharing memory)
  • Virtualization (hypervisor managing guest VM page tables)
  • Databases (frequent mmap/munmap for large datasets)
📝 Exam Tips: - SRAM: 6 transistors per cell, fast, no refresh, expensive — used for cache - DRAM: 1 transistor + capacitor per cell, slow, needs refresh — used for main memory - Cache hit rate of 90% with 5 ns cache and 100 ns RAM → AAT = 14.5 ns - Direct mapping: simple but conflict misses; Set-associative: compromise; Fully-associative: best hit rate but most hardware - Belady's anomaly: only FIFO suffers it. LRU is a stack algorithm — immune to Belady's