5.1 From rewriting to running
The lambda calculus says programs run by rewriting expressions. But real hardware runs instructions on memory. Graph reduction is the bridge: represent the expression as a graph in memory and rewrite the graph in place.
5.2 Why a graph and not a tree?
Consider square (3 + 4) where square x = x * x. β-reduction substitutes the argument twice:
square (3+4) → (3+4) * (3+4)
As a tree, (3+4) would be evaluated twice. As a graph, both uses point at the same node — evaluate it once, and both consumers see the result. This is sharing, and it is what upgrades normal-order reduction (slow) into lazy evaluation (efficient): evaluate at most once, and only if needed.
5.3 The mechanics of graph reduction
- The expression is stored as a graph of application nodes (
@), function nodes and data nodes. - To evaluate, follow the leftmost spine of
@nodes down to the function being applied (this walk is recorded on a spine stack). - If enough arguments are present, perform the reduction and — crucially — overwrite the root node of the redex with the result. Overwriting is what makes sharing work: everyone pointing at that node now sees the answer.
- Repeat until the graph is in weak head normal form (the outermost node is a value).
A node that represents a not-yet-evaluated expression is called a thunk (a suspension). Lazy evaluation = building thunks, forcing them on demand, and updating them with their value after the first forcing.
5.4 From interpretation to compiled graph reduction
Walking a graph node-by-node is slow. Modern implementations compile each function into a fixed sequence of instructions that builds and reduces its part of the graph directly. The classic abstract machines in this family:
| Machine | Idea |
|---|---|
| SK combinator machine | translate to S/K combinators, reduce those |
| G-machine | compile each function to "graph-building" code |
| TIM — Three Instruction Machine | observe that graph reduction needs only ~3 instruction kinds |
| STG machine | the one real GHC uses today (a descendant of these ideas) |
5.5 The Three Instruction Machine (TIM)
The TIM (Fairbairn & Wray, 1987) shows that lazy evaluation can be driven by essentially three instructions:
| Instruction | Meaning |
|---|---|
| Take n | pop n arguments off the stack and pack them into a frame (an environment record in the heap) |
| Push (arg/label/const) | push an argument from the current frame, a code label (for a subexpression), or a constant onto the stack |
| Enter x | jump to the code of x — tail-call into the function or thunk x |
Machine state = ⟨ current code, current frame pointer, argument stack, heap of frames ⟩.
Key ideas to remember for the exam:
- A function of n arguments compiles to
Take nfollowed by Pushes and a finalEnter. - Unevaluated arguments are passed as (code, frame) closure pairs — pushing a label does not evaluate it. That is exactly laziness.
Enterreplaces "call + return" — control only ever moves forward (continuation style), which is why three instructions suffice.
5.6 Tiny compilation example
For twice f x = f (f x):
twice: Take 2 -- frame now holds f (slot 1), x (slot 2)
Push (Label L1) -- push closure for (f x)
Enter (Arg 1) -- enter f — computes f (f x)
L1: Push (Arg 2) -- push x
Enter (Arg 1) -- enter f — computes (f x) when demanded
Note how the inner application (f x) becomes a pushed label — a thunk created in one instruction, evaluated only if f ever demands its argument.
5.7 Why this matters
- It explains why lazy evaluation is affordable: thunks are cheap (a code pointer + frame pointer), and sharing through node update prevents recomputation.
- It explains GHC behaviours you will meet in practice: space leaks (too many unevaluated thunks), the speed of tail calls, and strictness analysis (the compiler proving an argument will definitely be needed, so it can evaluate it early).
5.8 A full graph-reduction trace — worked example
Evaluate double (double 3) where double x = x + x, watching the sharing:
Step 0: @ -- root: apply double to (double 3)
/ \
double @ ← thunk T = (double 3)
/ \
double 3
Step 1: unfold outer double: root becomes T + T
BOTH operands point at the SAME node T -- sharing!
Step 2: + demands its left operand → force T once:
T: double 3 ⇒ 3 + 3 ⇒ 6
T is OVERWRITTEN with 6 -- the update
Step 3: + demands its right operand → it already points at 6. No recomputation.
Step 4: 6 + 6 ⇒ 12
Count the work: double 3 was evaluated once, although it is used twice. Under tree (non-sharing) normal-order reduction it would be evaluated twice; under eager evaluation it would also be once but even when not needed. Lazy = best of both: at most once, only if needed.
5.9 Weak head normal form (WHNF) — the precise stopping point
Definition. An expression is in WHNF when its outermost part is a constructor or a lambda (a value form) — its arguments may still be unevaluated thunks. Full normal form additionally requires every subexpression to be evaluated.
1 : (undefined) -- WHNF (outermost is the constructor :) but NOT normal form
\x -> 1 + 2 -- WHNF (a lambda); body untouched
(\x -> x) 5 -- NOT WHNF: outermost is an application (a redex)
Graph reduction stops at WHNF because that is all a pattern match needs: to choose between [] and (x:xs) you only need the outermost constructor. This is why head [1, undefined] succeeds — the second element's thunk is never forced.
5.10 Tree vs graph reduction — the comparison table
| Tree reduction | Graph reduction | |
|---|---|---|
| Argument occurrences | each copy evaluated separately | one shared node, evaluated once |
| After evaluation | result not remembered | redex root overwritten with result |
| Cost of normal order | can duplicate unbounded work | duplication eliminated by sharing |
| Memory picture | expression tree | directed acyclic graph + update |
| Implements | naive normal-order reduction | lazy evaluation |
5.11 Where laziness costs: thunks and space leaks
Laziness is not free — an unevaluated thunk occupies heap. The classic example:
sumTo :: Int -> Int -> Int
sumTo acc 0 = acc
sumTo acc n = sumTo (acc + n) (n - 1)
Lazily, acc + n is not computed each step; the accumulator grows as the thunk (((0+5)+4)+3)+... — O(n) memory for what should be O(1). The fix is to force the accumulator (seq, or the strict application operator $!, or GHC's strictness analysis doing it automatically). One sentence to remember: lazy in the data you may not need, strict in the accumulators you certainly will.
5.12 The implementation-pipeline picture
Each arrow is one lesson of this unit: source desugars to lambda calculus, lambda terms become graphs, graphs are reduced lazily, and the reduction itself is compiled into three-instruction code.
Exam pointers
- "Explain graph reduction with an example" — use
square (3+4)or §5.8; the two marks that distinguish answers are sharing (one node, two pointers) and updating (overwrite the root). - "Write short notes on the Three Instruction Machine" — name the three instructions with one line each, the machine state tuple, and the closure-pair trick that gives laziness.
- "Define WHNF" — boxed definition plus one example that is WHNF and one that is not.
Self-check
- Why must the root of a reduced redex be overwritten rather than just returning the value?
- Trace
fst (1+1, 2+2)— which addition is performed? - Is
Just (1+1)in WHNF? In normal form? - Which TIM instruction creates a thunk, and what two things does the thunk contain?
- Explain in two sentences why
sumToabove leaks space and how strictness fixes it.