2.1 SAT — the root problem
SAT: given a CNF formula φ — an AND of clauses, each an OR of literals (a variable or its negation) — does an assignment make φ true?
φ = (x ∨ ¬y) ∧ (¬x ∨ y ∨ z) ∧ (¬z)
Try x=T, y=T, z=F: (T) ∧ (T) ∧ (T) → satisfiable ✓
NP-complete by Cook–Levin (previous lesson). Brute force tries 2ⁿ assignments; modern CDCL "SAT solvers" handle industrial instances with millions of variables — without contradicting NP-completeness, because worst cases remain exponential. Know the contrast: 2-SAT ∈ P (implication-graph + SCC, linear time) — the jump from 2 to 3 literals is where hardness ignites.
2.2 3-SAT and the proof that SAT ≤ₚ 3-SAT
3-SAT: SAT restricted to exactly 3 literals per clause. Why care? A reduction source is most useful when it is rigid and uniform — gadgets are far easier to design from fixed-width clauses. 3-SAT is the workhorse source of 90% of NP-hardness proofs.
The clause-splitting reduction. Transform each clause C of φ by width:
| Width of C | Replacement (fresh variables u, uᵢ per clause) |
|---|---|
| 1: (ℓ) | (ℓ ∨ u₁ ∨ u₂)(ℓ ∨ ¬u₁ ∨ u₂)(ℓ ∨ u₁ ∨ ¬u₂)(ℓ ∨ ¬u₁ ∨ ¬u₂) |
| 2: (ℓ₁ ∨ ℓ₂) | (ℓ₁ ∨ ℓ₂ ∨ u)(ℓ₁ ∨ ℓ₂ ∨ ¬u) |
| 3 | keep unchanged |
| k > 3: (ℓ₁ ∨ ... ∨ ℓₖ) | chain: (ℓ₁ ∨ ℓ₂ ∨ u₁)(¬u₁ ∨ ℓ₃ ∨ u₂)(¬u₂ ∨ ℓ₄ ∨ u₃)...(¬uₖ₋₃ ∨ ℓₖ₋₁ ∨ ℓₖ) |
Chain correctness: if some ℓᵢ is true, set u₁...uᵢ₋₂ true and the rest false — every link satisfied. If all ℓᵢ are false, the chain forces u₁ = T, then u₂ = T, ... until the last clause fails — unsatisfiable. So the new formula is satisfiable iff the old one was; size grows linearly. ∎ Hence 3-SAT is NP-complete (membership in NP is the usual assignment-certificate).
2.3 CLIQUE — hardness enters graph theory
CLIQUE: given G and k, do k vertices exist that are pairwise adjacent?
- CLIQUE ∈ NP: certificate = the k vertices; check C(k,2) ≤ n² pairs. ✓
- 3-SAT ≤ₚ CLIQUE — the triple-per-clause construction proved step-by-step in Unit 1's reduction lesson: one vertex per literal occurrence, edges between non-contradictory literals of different clauses, k = #clauses. ∎
So CLIQUE is NP-complete — and through it, its mirror images:
INDEPENDENT-SET (k vertices, no edges between them): G has a k-clique ⟺ the complement graph Ḡ has a k-independent-set. One-line reduction.
VERTEX-COVER (k vertices touching every edge): S is an independent set ⟺ V∖S is a vertex cover, so G has an independent set of size k ⟺ G has a vertex cover of size n−k. Another one-liner. Both NP-complete.
2.4 The wider hall of fame (recognise on sight)
| Problem | Question | Field |
|---|---|---|
| SUBSET-SUM / PARTITION | subset hitting a target sum / equal halves | number problems |
| KNAPSACK (decision) | value ≥ V within weight W? | optimisation |
| HAM-CYCLE / HAM-PATH | visit every vertex exactly once | routing |
| TSP (decision) | tour of cost ≤ B? | logistics |
| 3-COLOURING | proper colouring with 3 colours | scheduling/registers |
| BIN-PACKING | fit items in k bins? | allocation |
| SET-COVER | k sets covering the universe? | facility location |
Each is NP-complete; each is a one-gadget hop from 3-SAT or its descendants. A curiosity with practical bite: SUBSET-SUM has a DP algorithm in O(n·T) time (T = target) — polynomial in the value but exponential in its bit-length. Such problems are weakly NP-complete and admit pseudo-polynomial algorithms; problems like TSP that stay hard even with small numbers are strongly NP-complete. Knowing which kind you face decides whether a DP rescue exists.
2.5 Clause splitting on a concrete formula (watch it work)
Take the single width-5 clause φ = (x₁ ∨ x₂ ∨ x₃ ∨ x₄ ∨ x₅). The chain rule with fresh variables u₁, u₂ produces:
(x₁ ∨ x₂ ∨ u₁) ∧ (¬u₁ ∨ x₃ ∨ u₂) ∧ (¬u₂ ∨ x₄ ∨ x₅)
If x₃ = T (say): set u₁ = T, u₂ = F — the three clauses read (·∨·∨T), (F∨T∨·), (T∨·∨·): all satisfied ✓. If all xᵢ = F: clause 1 forces u₁ = T, clause 2 then forces u₂ = T, and clause 3 reads (F∨F∨F) — fails ✗. The uᵢ behave like a "carry" that must eventually be cashed in by a true literal. Three clauses replaced one; in general k−2 clauses replace a width-k clause — linear growth, exactly as claimed.
2.6 SUBSET-SUM from 3-SAT — the digit-gadget table, worked
For the one-clause formula φ = C₁ = (x₁ ∨ x₂ ∨ ¬x₃), build base-10 numbers with one digit column per variable and one per clause:
| Number | x₁ digit | x₂ digit | x₃ digit | C₁ digit | Role |
|---|---|---|---|---|---|
| v₁ ("x₁ = T") | 1 | 0 | 0 | 1 | x₁ appears in C₁ |
| v₁′ ("x₁ = F") | 1 | 0 | 0 | 0 | |
| v₂ ("x₂ = T") | 0 | 1 | 0 | 1 | x₂ appears in C₁ |
| v₂′ ("x₂ = F") | 0 | 1 | 0 | 0 | |
| v₃ ("x₃ = T") | 0 | 0 | 1 | 0 | |
| v₃′ ("x₃ = F") | 0 | 0 | 1 | 1 | ¬x₃ appears in C₁ |
| s₁ and s₁′ (slack) | 0 | 0 | 0 | 1 each | top the clause column up |
| Target T | 1 | 1 | 1 | 3 |
Each variable column must total exactly 1 → exactly one of vᵢ/vᵢ′ is chosen — a truth assignment. The clause column reaches 3 only if at least one chosen literal contributes (the two slacks supply at most 2). No carries can occur — every column's total stays below 10 — so the digits act independently. Subset hits T ⟺ φ is satisfiable. ∎ This miniature is the standard 8-mark "reduce 3-SAT to SUBSET-SUM" answer; with m clauses and n variables there are 2n + 2m numbers of n + m digits — polynomial.
2.7 HAM-CYCLE ≤ₚ TSP(decision): the cleanest reduction in the catalogue
Given G = (V, E) with n vertices, build the complete graph on V with weights: 1 if (u,v) ∈ E, 2 otherwise. Bound B = n.
- (⟹) G Hamiltonian ⟹ that cycle uses n weight-1 edges ⟹ a tour of cost exactly n ≤ B. ✓
- (⟸) A tour of cost ≤ n has n edges each of weight ≥ 1 ⟹ all weights are exactly 1 ⟹ every tour edge is an edge of G ⟹ a Hamiltonian cycle. ✓
Construction is O(n²); both directions are two lines each. Bonus fact worth one mark: weights {1, 2} satisfy the triangle inequality, so even metric TSP is NP-complete — yet Unit 4 shows metric TSP is 1.5-approximable while general TSP is not approximable at all. The reduction's weights decide the approximability landscape.
2.8 Membership-in-NP one-liners for the hall of fame (bank these)
| Problem | Certificate | Polynomial check |
|---|---|---|
| SUBSET-SUM / PARTITION | the subset | one addition pass |
| KNAPSACK (decision) | the chosen item set | sum weights ≤ W, sum values ≥ V |
| HAM-CYCLE / HAM-PATH | the vertex ordering | n edge lookups |
| TSP (decision) | the tour | add the weights, compare with B |
| 3-COLOURING | the colour map | check every edge bi-chromatic |
| BIN-PACKING | the item-to-bin assignment | per-bin capacity sums |
| SET-COVER | the k chosen sets | union equals the universe |
Self-check
- Apply clause splitting to (a ∨ b) and to (a) — write out the output clauses.
- In the 3-SAT → CLIQUE construction, what exactly is k, and why can a clique never use two vertices from the same triple?
- Why can no carries occur in the SUBSET-SUM digit table, and why does that matter?
- Reproduce the HAM-CYCLE → TSP reduction including both correctness directions.
- Which of SUBSET-SUM and TSP admits a pseudo-polynomial algorithm, and what does that say about weak vs strong NP-completeness?