Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Unit 3: Pipelining — Speedup, Hazards & Their Cures

Lesson 9 of 17 in the free Logical Organization of Computer-II notes on Siksha Sarovar, written by Rohit Jangra.

2.1 The Idea: An Assembly Line for Instructions

Without pipelining, instruction i+1 waits for instruction i to finish completely. Pipelining splits execution into k stages separated by latch registers and overlaps k instructions at once — like a laundry with separate washer, dryer and folding table. It improves throughput (instructions completed per second), not the latency of any single instruction — the most repeated conceptual trap.

The classic 5-stage RISC pipeline:

StageNameWork
IFInstruction Fetchread instruction, PC += 4
IDInstruction Decodedecode + read registers
EXExecuteALU operation / address calc
MEMMemoryload/store data access
WBWrite Backwrite result to register file
Cycle:      1    2    3    4    5    6    7    8
Instr 1:   IF   ID   EX   MEM  WB
Instr 2:        IF   ID   EX   MEM  WB
Instr 3:             IF   ID   EX   MEM  WB
Instr 4:                  IF   ID   EX   MEM  WB

After the pipe fills, one instruction completes every cycle.

2.2 Speedup Arithmetic (Guaranteed Numerical)

k stages, clock period tp, n tasks: pipelined time = (k + n − 1) · tp (k cycles to fill, then n − 1 more completions). Non-pipelined time = n · k · tp (or n · tn).

S = n.k.tp / (k + n - 1).tp = n.k / (k + n - 1)

Example: k = 4, tp = 20 ns, n = 100
Pipelined     = (4 + 99) x 20 = 103 x 20 = 2060 ns
Non-pipelined = 100 x 4 x 20  = 8000 ns
Speedup S     = 8000 / 2060   = 3.88

As n → ∞, S → k (here 4): maximum speedup = number of stages. Efficiency = S/k = 3.88/4 = 97%; throughput = n / time = 100 / 2060 ns ≈ 48.5 million instructions/s. Why the ideal is never reached: fill/drain overhead, latch delay added to tp, unequal stage lengths (clock is set by the slowest stage), and hazards below.

2.3 The Three Hazards

1. Structural hazard — hardware conflict. Two stages need the same resource, e.g., IF and MEM both accessing a single memory port. Cures: separate instruction/data caches (Harvard style), duplicated units, or a stall.

2. Data hazard — value not ready (RAW).

ADD R1, R2, R3     ; R1 written in WB (cycle 5)
SUB R4, R1, R5     ; R1 needed in ID/EX (cycle 3!)

Cures, in order of elegance:

  • Forwarding (bypassing): route the ALU output of ADD straight into SUB's ALU input — hazard vanishes with zero stalls in most cases.
  • Load-use exception: a value loaded from memory is ready only after MEM, so even with forwarding a dependent next instruction stalls one cycle — beloved trap.
  • Stalls (bubbles): hardware interlock inserts NOPs until the value is ready.
  • Compiler scheduling: reorder independent instructions into the gap.

3. Control hazard — where next? A branch decides its target late, so the instructions fetched behind it may be wrong. Cures:

  • Stall until the branch resolves (simple, slow).
  • Delayed branch: always execute the instruction in the "delay slot" (classic MIPS; compiler fills the slot).
  • Branch prediction: 1-bit predictor mispredicts twice per loop (entry and exit); a 2-bit saturating counter requires two consecutive misses to flip, mispredicting once per loop — explain-the-difference is a standard 5-marker.

Penalty arithmetic: base CPI = 1, branches are 20% of instructions, misprediction penalty 3 cycles, all mispredicted → effective CPI = 1 + 0.2 × 3 = 1.6; with a predictor that is 90% accurate, CPI = 1 + 0.2 × 0.1 × 3 = 1.06.

🎯 Exam Focus

  1. Explain the five-stage instruction pipeline with a space–time diagram for 6 instructions.
  2. A 4-segment pipeline with 25 ns clock processes 200 tasks: compute pipelined time, non-pipelined time, speedup, efficiency and throughput.
  3. Define structural, data and control hazards with one example and one remedy each.
  4. What is operand forwarding? Show how it removes the RAW stall between ADD R1,R2,R3 and SUB R4,R1,R5, and why a load-use case still stalls once.
  5. Compare 1-bit and 2-bit branch prediction for a loop that runs 100 iterations.
  6. Why can pipelining never achieve speedup equal to k in practice? Give three reasons.