4.3 White-Box Testing & Cyclomatic Complexity
What is White-Box Testing?
White-box testing (also glass-box, clear-box, structural testing) examines the internal structure of the code. The tester reads the source code and designs test cases that exercise specific paths, branches and statements.
Test cases come from understanding
the code's structure, not the spec.
When to use
- Unit testing (primary)
- Some integration testing
- Security analysis
- Code-coverage measurement
Advantages
- Reveals bugs in implementation (not visible from spec)
- Achieves code coverage targets
- Validates that all logic paths execute
Disadvantages
- Requires programming skill
- Time-consuming
- May miss missing functionality (if code doesn't have it, no path covers it)
- Test cases need rewriting if code changes
---
Code Coverage — what we measure
White-box testing uses coverage criteria as goals:
| Coverage Type | What is covered | Strength |
|---|---|---|
| Statement coverage | Every executable statement | Weakest |
| Branch / Decision coverage | Every branch (T and F) of every decision | Stronger |
| Condition coverage | Every Boolean sub-expression's T and F | Stronger |
| Multiple Condition coverage | Every combination of conditions | Strong |
| Path coverage | Every executable path through the code | Strongest (often impractical) |
| Modified Condition / Decision (MC/DC) | Each condition independently affects decision | Required for DO-178C |
Example function
int classify(int x, int y) {
if (x > 0 && y > 0) // L2
return 1; // L3
else if (x < 0 || y < 0) // L4
return -1; // L5
else
return 0; // L7
}
100% statement coverage needs 3 tests covering lines 3, 5, 7:
- (1, 1) → 1 ✓ covers L3
- (−1, 5) → −1 ✓ covers L5 (via
x < 0) - (0, 0) → 0 ✓ covers L7
100% branch coverage also needs each decision's true and false:
- For
x > 0 && y > 0: covered (T by 1,1; F by −1,5 and 0,0) - For
x < 0 || y < 0: covered (T by −1,5; F by 0,0)
---
Cyclomatic Complexity — McCabe (1976)
The most-tested metric in IPU papers. Cyclomatic complexity V(G) measures the number of linearly independent paths through a program. Three equivalent formulas:
V(G) = E − N + 2 (E = edges, N = nodes in control-flow graph)
V(G) = P + 1 (P = number of predicate / decision nodes)
V(G) = number of regions in the control-flow graph
Why it matters
- V(G) = minimum number of test cases for basis path coverage
- V(G) = number of independent paths through the code
- High V(G) means hard-to-test, hard-to-maintain code
Recommended thresholds (McCabe)
- 1–10: simple, low risk
- 11–20: moderate, more test effort
- 21–50: complex, high risk
- > 50: untestable, must be refactored
---
Step-by-step example
int max(int a, int b, int c) {
int m = a; // S1
if (b > m) // S2 (decision)
m = b; // S3
if (c > m) // S4 (decision)
m = c; // S5
return m; // S6
}
Step 1: Draw control flow graph (CFG)
┌────────┐
│ S1 │
└───┬────┘
│
┌───▼────┐
│ S2 │ ◄── decision
└─┬────┬─┘
T │ │ F
┌─▼──┐ │
│S3 │ │
└─┬──┘ │
└─┬──┘
│
┌───▼────┐
│ S4 │ ◄── decision
└─┬────┬─┘
T │ │ F
┌─▼──┐ │
│S5 │ │
└─┬──┘ │
└─┬──┘
│
┌───▼────┐
│ S6 │
└────────┘
Step 2: Count
- Nodes (N): S1, S2, S3, S4, S5, S6, and merge points after S2 and S4 → 8 nodes
- Wait — usually we collapse merge nodes. Simpler count: 6 statement nodes, plus the structure of the diamond gives the count by edges/regions.
Using V(G) = P + 1 (cleanest):
- Decision nodes: S2 and S4 → P = 2
- V(G) = 2 + 1 = 3
So 3 independent paths, 3 test cases needed:
| Test | a, b, c | Path |
|---|---|---|
| 1 | 5, 3, 2 | S2 false, S4 false (a is max) |
| 2 | 3, 5, 2 | S2 true, S4 false (b is max) |
| 3 | 3, 2, 5 | S2 false, S4 true (c is max) |
Step 3: Verify with V(G) = E − N + 2
If we draw the CFG cleanly with merge nodes, E (edges) and N (nodes) will satisfy E − N + 2 = 3. All three formulas agree.
---
Bigger example
void categorize(int score) {
if (score >= 90) // D1
printf("A");
else if (score >= 75) // D2
printf("B");
else if (score >= 50) // D3
printf("C");
else if (score >= 35) // D4
printf("D");
else
printf("F");
}
Decision nodes: D1, D2, D3, D4 → P = 4 V(G) = 4 + 1 = 5
Test cases for basis paths:
- score = 95 → "A"
- score = 80 → "B"
- score = 60 → "C"
- score = 40 → "D"
- score = 20 → "F"
5 independent paths, 5 test cases.
---
Basis Path Testing — McCabe's procedure
- Draw the CFG from code
- Compute V(G)
- Determine a basis set of V(G) independent paths
- Prepare a test case for each basis path
- Execute and verify
The basis set is a mathematical basis for the path space — any other path can be expressed as a combination of basis paths.
---
Common white-box techniques
| Technique | Idea |
|---|---|
| Statement coverage | Execute every statement at least once |
| Branch coverage | Execute every branch (T and F) of every decision |
| Path coverage | Execute every possible path (often infeasible) |
| Basis path testing | McCabe's method — V(G) paths |
| Data flow testing | Trace variables — definitions and uses |
| Loop testing | Test loops at 0, 1, n, n−1, n+1, and max iterations |
| Mutation testing | Inject small bugs into code; tests should catch them |
---
Mutation Testing — the bug-injector approach
Idea: generate small mutations of the code (change + to -, < to <=, etc.). A good test suite should "kill" every mutant by failing. Mutation score = killed mutants / total mutants.
Mutation testing is the most rigorous coverage criterion but is computationally expensive. Tools: PIT (Java), Stryker (JS), Mutmut (Python).
---
Loop Testing — checklist
For each loop, design tests for:
| Test | Purpose |
|---|---|
| 0 iterations | Loop skipped |
| 1 iteration | Single execution |
| 2 iterations | Multiple, baseline |
| (n−1, n, n+1) iterations | Around expected count |
| Max iterations | Boundary at max |
| Nested loops | Inner full, outer minimal; vice versa |
Loops are a famous source of off-by-one defects. Many production bugs trace back to insufficient loop testing.
---
Black-Box vs White-Box — comparison
| Aspect | Black-Box | White-Box |
|---|---|---|
| View of system | External | Internal |
| Source code needed? | No | Yes |
| Done by | Testers, users | Developers |
| Stage | System, acceptance | Unit, integration |
| Focus | Functionality | Logic / code paths |
| Test basis | SRS | Code structure |
| Programming skill | Not required | Required |
| Catches missing function? | Yes | No |
| Catches dead code? | No | Yes |
---
Grey-Box Testing
A hybrid — tester has partial knowledge of internals:
- Knows the data flow / database schema but not the exact code
- Can write tests that exercise specific paths via inputs
- Common in security testing (knowing the architecture but treating individual components as black boxes)
---
Key Terms — Lesson 4.3
The vocabulary below covers white-box testing, McCabe's complexity metric, coverage criteria, and the modern automation around them.
White-Box Testing — A testing approach where the internal structure of the code is known to the tester and test cases are designed to exercise specific paths, branches, and statements. Also called glass-box testing, clear-box testing, structural testing, or code-based testing.
Code Coverage — The percentage of the source code that is executed by the test suite. Coverage is measured by instrumenting the code (recording which lines/branches run) during test execution. Industry target: ~80% line coverage as a healthy baseline.
Statement Coverage / Line Coverage — The weakest coverage criterion: every executable statement is executed at least once by some test. Easy to achieve and to measure but does not guarantee that all decision outcomes are tested.
Branch Coverage / Decision Coverage — A stronger criterion than statement coverage: every branch of every decision (the true path and the false path of every if, every loop entry/skip) is exercised. Branch coverage is typically the industry's "useful target."
Condition Coverage — A criterion requiring that each Boolean sub-expression in a compound condition takes both true and false values. Example: if (a > 0 && b > 0) — each of a > 0 and b > 0 must independently be tested true and false.
Multiple Condition Coverage — A still-stronger criterion: every combination of Boolean sub-expression values is tested. For n conditions there are up to 2ⁿ combinations. Usually impractical for real code.
MC/DC (Modified Condition / Decision Coverage) — A specific criterion required by DO-178C Level A (aviation) and ISO 26262 ASIL D (automotive): every Boolean operand must be shown to independently affect the outcome of the decision. MC/DC is the gold standard for safety-critical code.
Path Coverage — The strongest criterion: every executable path through the control-flow graph is exercised. Often infeasible because programs with loops have exponentially or infinitely many paths. Approximated by basis path testing.
Control Flow Graph (CFG) — A directed graph where nodes are basic blocks of code (sequences with one entry and one exit) and edges are control transfers between them (branches, jumps, function calls). The CFG is the structural representation that white-box techniques operate on.
Basic Block — A maximal sequence of statements with one entry and one exit — control enters at the top, exits at the bottom, no jumps in between. Basic blocks are the nodes in the CFG.
Cyclomatic Complexity / V(G) (McCabe, 1976) — A metric for the number of linearly independent paths through a function's control-flow graph. Three equivalent formulas: V(G) = E − N + 2 (edges minus nodes plus 2), V(G) = P + 1 (predicates plus 1), V(G) = number of regions in the CFG. V(G) = minimum number of test cases for basis path coverage.
Predicate Node / Decision Node — A CFG node where control branches based on a condition — if, while, for, switch. The count of predicate nodes plus one equals cyclomatic complexity.
McCabe's Thresholds — McCabe's recommended interpretation of V(G): 1–10 simple, low risk; 11–20 moderate, more test effort; 21–50 complex, high risk; > 50 untestable, must be refactored. Modern industry uses V(G) > 10 as a refactoring signal.
Linearly Independent Path — A path through the CFG that introduces at least one new edge not in any previously chosen path. The basis set is a maximal collection of linearly independent paths; its size equals V(G). Any path in the program is a linear combination of basis paths.
Basis Path Testing — McCabe's procedure: draw the CFG, compute V(G), identify a basis set of V(G) paths, and write one test case per path. Guarantees branch coverage and exercises each independent path.
Region — In a planar drawing of the CFG, a bounded area enclosed by edges (plus the unbounded outer region). The number of regions equals V(G), giving the third equivalent formula.
Data Flow Testing — A white-box technique that traces the lifecycle of each variable — where it is defined (assigned) and where it is used (read). Tests are designed to exercise every definition-use (DU) chain. Catches bugs that statement coverage misses — uninitialised variables, dead definitions, dangerous shadowing.
Loop Testing — A specific white-box technique focused on loops, the most common source of off-by-one defects. Test cases exercise the loop at 0 iterations (skip), 1 iteration, 2 iterations, n−1 / n / n+1, and maximum iterations.
Mutation Testing — A technique that deliberately introduces small bugs (mutants) into the code — change + to -, change < to <=, remove a statement — and runs the test suite. A test suite that kills (fails on) most mutants is judged more thorough than one that misses them. Mutation score = killed / total mutants.
Mutant — A version of the program with one small synthetic bug inserted. Mutants are generated automatically by mutation testing tools (PIT for Java, Stryker for JavaScript, Mutmut for Python).
Killed vs Surviving Mutant — A mutant is killed if at least one test in the suite fails on it (the suite detected the bug). A mutant survives if all tests pass — meaning the suite is blind to that kind of change. Surviving mutants indicate weak tests.
Coverage Tool — Software that instruments the code to record which lines/branches/paths execute during tests, then produces a coverage report. JaCoCo for Java, Coverage.py for Python, Istanbul / nyc for JavaScript, gcov / lcov for C/C++, Coverlet for .NET.
Code Coverage Threshold (in CI) — A configured minimum coverage percentage below which the CI build fails. Common configurations: fail if line coverage falls under 80%, or fail if any single file is under 60%. Coverage thresholds prevent silent erosion of test rigour.
Grey-Box Testing — A hybrid testing approach where the tester has partial knowledge of internals — data flow, database schema, architecture — but still treats individual components as black boxes. Common in security testing and integration testing.
Symbolic Execution — An advanced white-box technique where the program is executed with symbolic inputs (e.g., x = α rather than x = 5) and the tool automatically derives the input constraints that lead to each path. Used by tools like KLEE and SAGE for automatic test case generation, especially in security.
Fuzz Testing / Fuzzing — A testing technique that feeds random or semi-random inputs into a program to look for crashes, hangs, or unexpected outputs. Modern coverage-guided fuzzers (AFL, libFuzzer, honggfuzz) use coverage feedback to evolve inputs that exercise new paths. Hugely successful in finding security bugs.
Static Analysis (White-Box Context) — Analysing the source code without executing it to find bugs, security issues, and code smells. Tools: SonarQube, Coverity, Klocwork, CodeQL. Static analysis complements white-box testing because it can find defects in code paths that tests may never reach.
Inspection vs Testing (White-Box Context) — Two complementary defect-finding activities. Inspection is the static white-box activity — reading code by humans (Fagan, peer review). Testing is the dynamic activity — running code with specific inputs. Inspections find defects testing typically misses (intent-related, design-level); testing finds defects inspections miss (behavioural, environment-dependent).
---
Study deep
- Coverage is a means, not an end. 100% statement coverage doesn't mean the code is bug-free — it means every line was executed, not that every behaviour was verified. Coverage targets are a useful baseline (~80% is industry typical) but not a quality guarantee.
- V(G) approximates testability. Functions with V(G) ≤ 10 are usually manageable. Above 20 the function is likely doing too much and should be split. Cyclomatic complexity is a refactoring trigger.
- MC/DC is required for life-critical software. Aviation (DO-178C Level A) and automotive (ISO 26262 ASIL D) mandate Modified Condition / Decision Coverage — each Boolean operand must independently influence the decision outcome. Far stricter than basic branch coverage.
- Mutation testing is the gold standard. Studies show that test suites with high mutation scores catch real bugs better than test suites with merely high coverage. The cost is significant — entire test suite reruns for each mutation — but tools are improving.
- Modern white-box automation. Tools like JaCoCo (Java), Coverage.py (Python), and Istanbul (JS) report coverage automatically. CI pipelines fail if coverage drops below a threshold (e.g. 80%).
PYQ pattern (numerical, very common): "For the given code, draw the control flow graph and compute cyclomatic complexity." — Identify decision nodes (P), compute V(G) = P + 1. Verify with E − N + 2 or regions. Identify a basis set of paths.