2.2 Size Estimation — LOC, Function Points & Halstead
Why size matters
Software size is the first input to every cost-estimation model. Predict size wrong, and your cost, schedule and risk estimates will all be wrong.
Three classic measures dominate the IPU syllabus:
| Measure | Year | Author | Best For |
|---|---|---|---|
| LOC (Lines of Code) | 1960s | (Folk practice) | Quick estimates when language is known |
| Function Points (FP) | 1979 | Allan Albrecht (IBM) | Language-independent, user-perspective |
| Halstead Software Science | 1977 | Maurice Halstead | Academic; rarely used industrially |
---
1. Lines of Code (LOC)
Definition: the number of lines in the program source code, excluding blank lines and comments (by convention — though this varies).
Variants
- SLOC = Source LOC = lines of executable code
- LLOC = Logical LOC = number of statements (1 statement may span multiple physical lines)
- KLOC = thousand LOC
- DSI = Delivered Source Instructions (used in COCOMO)
Estimation steps
- Decompose the system into modules
- Estimate LOC for each module using historical data
- Sum module LOC + 10–20% integration overhead
Example
A library system has 5 modules:
| Module | Estimated LOC |
|---|---|
| Member management | 800 |
| Book catalogue | 1,200 |
| Issue / return | 1,500 |
| Reports | 700 |
| Admin | 600 |
| Sub-total | 4,800 |
| + 15% integration overhead | 720 |
| Total estimate | 5,520 LOC ≈ 5.5 KLOC |
Advantages of LOC
- Simple to compute (automated tools count it)
- Universally understood
- Useful for productivity benchmarks within a stable team
Disadvantages of LOC
- Language-dependent — a feature is far shorter in Python than in COBOL
- Penalises good design — a verbose 1,000-line solution looks more productive than a tight 100-line one
- Hard to estimate before code exists
- Cannot compare across languages or teams with different style
---
2. Function Points (Albrecht, 1979)
Function Point Analysis measures size from the user's perspective, independent of programming language. It counts five types of user-visible elements:
| Element Type | Symbol | Description | Example |
|---|---|---|---|
| External Inputs | EI | Data input to the system | Login form, add product |
| External Outputs | EO | Outputs to the user | Invoice, monthly report |
| External Inquiries | EQ | Read-only outputs | Stock-balance query |
| Internal Logical Files | ILF | Internal data stores | Customer table, order table |
| External Interface Files | EIF | Data shared with other systems | Payment-gateway feed |
Step-by-step FP calculation
Step 1: Count each element type for the proposed system.
Step 2: Classify each as Simple / Average / Complex and apply weights:
| Element | Simple | Average | Complex |
|---|---|---|---|
| External Inputs (EI) | 3 | 4 | 6 |
| External Outputs (EO) | 4 | 5 | 7 |
| External Inquiries (EQ) | 3 | 4 | 6 |
| Internal Logical Files (ILF) | 7 | 10 | 15 |
| External Interface Files (EIF) | 5 | 7 | 10 |
Step 3: Compute Unadjusted Function Points (UFP):
UFP = Σ (count × weight)
Step 4: Compute Complexity Adjustment Factor (CAF): Rate 14 General System Characteristics (GSCs) on a scale 0 (no influence) to 5 (essential):
- Data communications
- Distributed data processing
- Performance
- Heavily used configuration
- Transaction rate
- Online data entry
- End-user efficiency
- Online update
- Complex processing
- Reusability
- Installation ease
- Operational ease
- Multiple sites
- Facilitate change
Sum the 14 ratings = Total Degree of Influence (TDI) (0 to 70).
CAF = 0.65 + 0.01 × TDI
So CAF ranges from 0.65 (low complexity) to 1.35 (high).
Step 5: Adjusted Function Points (FP):
FP = UFP × CAF
---
Worked Example — Library System
Counts:
- EI: 4 (Login, Add Book, Issue Book, Return Book) — all average
- EO: 3 (Receipt, Overdue list, Monthly report) — average
- EQ: 2 (Search book, View member) — simple
- ILF: 3 (Member file, Book file, Loan file) — average
- EIF: 1 (Library Federation feed) — simple
UFP calculation:
| Element | Count | Weight | Sub-total |
|---|---|---|---|
| EI (avg) | 4 | 4 | 16 |
| EO (avg) | 3 | 5 | 15 |
| EQ (simple) | 2 | 3 | 6 |
| ILF (avg) | 3 | 10 | 30 |
| EIF (simple) | 1 | 5 | 5 |
| UFP | 72 |
Assume TDI = 35 (typical mid-complexity system, average rating ≈ 2.5 per GSC).
CAF = 0.65 + 0.01 × 35 = 0.65 + 0.35 = 1.00
FP = 72 × 1.00 = 72
So the library system is 72 Function Points.
---
Converting FP → LOC
You can convert FP to LOC using language-specific multipliers (Capers Jones backfiring tables):
| Language | LOC per FP (approx) |
|---|---|
| Assembler | 320 |
| C | 128 |
| C++ | 55 |
| Java | 53 |
| Python | 27 |
| SQL | 13 |
| Visual Basic | 32 |
So 72 FP in Java ≈ 72 × 53 = 3,816 LOC ≈ 3.8 KLOC.
---
Advantages of FP
- Language-independent — can be estimated before language is chosen
- Counted before code exists — usable for early planning
- User-perspective — counts business-relevant things
- Industry-standard — ISO/IEC 20926
Disadvantages of FP
- Subjective weights and complexity ratings
- Requires training (Certified Function Point Specialist exists)
- More effort than LOC counting
- Doesn't capture algorithmic complexity
---
3. Halstead's Software Science Measures (1977)
Maurice Halstead modelled software as a function of operators and operands.
| Symbol | Meaning |
|---|---|
| n₁ | Number of distinct operators (e.g. +, if, while, function names) |
| n₂ | Number of distinct operands (variables, constants) |
| N₁ | Total occurrences of operators |
| N₂ | Total occurrences of operands |
The four Halstead measures
Program vocabulary: n = n₁ + n₂
Program length: N = N₁ + N₂
Program volume: V = N × log₂(n)
Program difficulty: D = (n₁ / 2) × (N₂ / n₂)
Programming effort: E = D × V
Estimated time (sec): T = E / 18 (Stroud number ≈ 18)
Estimated bugs: B = V / 3000
Worked example
Consider the C statement:
sum = a + b * c;
Operators: =, +, *, ; → distinct = 4, total = 4 → n₁ = 4, N₁ = 4 Operands: sum, a, b, c → distinct = 4, total = 4 → n₂ = 4, N₂ = 4
n = 4 + 4 = 8
N = 4 + 4 = 8
V = 8 × log₂(8) = 8 × 3 = 24
D = (4/2) × (4/4) = 2 × 1 = 2
E = 2 × 24 = 48
T = 48 / 18 ≈ 2.67 seconds
B = 24 / 3000 ≈ 0.008
Advantages
- Mathematically rigorous
- Computable from source code automatically
- Predicts bugs reasonably well
Disadvantages
- Defining "operator" and "operand" varies by language
- Less intuitive than LOC or FP
- Rarely used in industry today
- Only useful after code exists
---
Comparison — quick exam table
| Property | LOC | Function Points | Halstead |
|---|---|---|---|
| Pre-code estimate possible? | Hard | Yes | No |
| Language-independent? | No | Yes | No |
| User-perspective? | No | Yes | No |
| Effort to compute | Trivial | Moderate | Easy (after code) |
| Industry use today | High | Moderate | Low |
| ISO standard? | No | Yes (IFPUG) | No |
---
Key Terms — Lesson 2.2
The terms below are the vocabulary of size estimation — every numerical PYQ on FP or Halstead requires fluent use of them.
Size Estimation — Predicting how big a piece of software will be before it is built, so that cost, effort, and schedule can be derived. Size estimation is the first input to almost every cost model (COCOMO, Use Case Points, Story Points). Wrong size → wrong cost → wrong project plan.
SLOC / LLOC / DSI / KLOC — Variants of "lines of code." SLOC (Source LOC) counts physical or logical executable lines. LLOC (Logical LOC) counts language statements regardless of how many physical lines they span. DSI (Delivered Source Instructions) is the COCOMO-specific count of delivered executable statements. KLOC is 1,000 LOC.
Function Point (FP) — Allan Albrecht's 1979 size metric that counts the business functionality a system delivers, weighted by complexity, independent of programming language. The IFPUG (International Function Point Users Group) maintains the standard counting practice (ISO/IEC 20926).
External Input (EI) — In Function Point counting, an input transaction that crosses the system boundary and updates internal data — e.g., a "create order" form, a "register user" form. Weighted 3 (simple), 4 (average), or 6 (complex) FPs.
External Output (EO) — An output transaction that crosses the system boundary and contains derived data — a generated invoice, a monthly report, a calculated commission statement. Weighted 4, 5, or 7 FPs.
External Inquiry (EQ) — A read-only output transaction — input parameter in, retrieved data out, no derivation, no state change. A stock-balance query or "view my profile" page. Weighted 3, 4, or 6 FPs.
Internal Logical File (ILF) — A logical group of internally maintained data — a Customer table, an Order table. Weighted 7, 10, or 15 FPs depending on the number of record types and data elements.
External Interface File (EIF) — A logical group of data referenced by the system but maintained by another system — a payment gateway's product catalog feed, an external authentication provider's user list. Weighted 5, 7, or 10 FPs.
Unadjusted Function Points (UFP) — The raw sum of (count × weight) across all five element types, before adjustment for system characteristics. UFP captures the pure functional size.
General System Characteristics (GSCs) / Total Degree of Influence (TDI) — Fourteen factors (data communications, distributed processing, performance, complex processing, reusability, etc.) each rated 0 (no influence) to 5 (essential). The sum is the TDI (range 0–70) that drives the complexity adjustment.
Complexity Adjustment Factor (CAF) / Value Adjustment Factor (VAF) — A multiplier computed as CAF = 0.65 + 0.01 × TDI, ranging from 0.65 (low complexity) to 1.35 (high complexity). Modifies UFP to get the final Adjusted Function Points.
Adjusted Function Points (FP) — The final FP count after applying CAF: FP = UFP × CAF. This is the number used as input to cost-estimation models like COCOMO II.
Backfiring — Converting FP to LOC using language-specific multipliers from Capers Jones backfiring tables (e.g., ~53 LOC/FP for Java, ~27 LOC/FP for Python, ~13 LOC/FP for SQL). Useful for cross-comparison between FP-based estimates and LOC-based historical data; accuracy is approximate (±30%).
Capers Jones — American software-metrics researcher whose backfiring tables and benchmark databases (covering thousands of projects across 700+ programming languages) are the standard cross-reference between FP and LOC sizes.
Halstead's Software Science — Maurice Halstead's 1977 framework computing vocabulary (n), length (N), volume (V = N · log₂ n), difficulty (D = (n₁/2) × (N₂/n₂)), effort (E = D × V), time (T = E/18 seconds), and estimated bugs (B = V/3000) from the count of operators and operands in the source.
Operator (Halstead) — Any symbol or keyword that performs an action — arithmetic operators (+, *), comparison operators (==, <), logical operators (&&, ||), assignment (=), control flow (if, while, for, return), function names, brackets, semicolons. The counting convention varies by language, which is one of Halstead's weaknesses.
Operand (Halstead) — Any variable, constant, literal, or identifier that is acted upon by an operator. sum, x, 5, "hello" are all operands.
Use Case Points (UCP) — A 1993 extension of FP for object-oriented systems, counting use cases and actors weighted by complexity and adjusted by technical and environmental factors. Less standardised than FP but better matched to modern OO design.
Object Points / Application Points — A variant size metric used in COCOMO II's "Application Composition" mode — counts screens, reports, and 3GL components instead of LOC or FP. Suited to 4GL and low-code application development.
Story Points — The Agile abandonment of precise size measurement in favour of relative sizing by team consensus, often on a Fibonacci scale (1, 2, 3, 5, 8, 13, 21). Story points combine effort, complexity, and risk into a single team-specific unit and are calibrated by tracking velocity (story points completed per sprint).
ISO/IEC 20926 (IFPUG-FP) — The ISO standard for the IFPUG Function Point counting method. Two related ISO standards cover competing FP variants: ISO/IEC 24570 (Nesma) and ISO/IEC 19761 (COSMIC FP) — the latter is increasingly used for real-time and embedded systems.
---
Study deep
- Use FP for early planning, LOC for tracking. Function Points are best when estimating before code exists; LOC is best for productivity tracking once development begins.
- Backfiring is a useful shortcut but loses precision. Converting FP → LOC via Capers Jones tables works for ballpark figures but can be off by ±30% for individual projects.
- Halstead matters historically. The Halstead measures heavily influenced the design of modern maintainability indices and complexity metrics. The IPU exam likes the basic formulas — memorise V, D, E.
- Modern alternatives. Use Case Points (UCP) extends FP to OO systems. Object Points (OP) adapt FP to client-server systems. Story Points (Agile) abandon precise measurement in favour of relative sizing.
PYQ pattern (very common, numerical): "Compute Function Points for a system with EI=5(avg), EO=4(simple), EQ=3(complex), ILF=2(avg), EIF=1(simple), TDI=30." — UFP = 5×4 + 4×4 + 3×6 + 2×10 + 1×5 = 79; CAF = 0.65 + 0.30 = 0.95; FP = 79 × 0.95 ≈ 75.
PYQ pattern: "Given a C function, compute Halstead's Volume, Difficulty and Effort." — Identify n₁, n₂, N₁, N₂; apply the four formulas.