Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

2.2 Size Estimation — LOC, Function Points & Halstead

Lesson 11 of 24 in the free Software Engineering notes on Siksha Sarovar, written by Rohit Jangra.

2.2 Size Estimation — LOC, Function Points & Halstead

Why size matters

Software size is the first input to every cost-estimation model. Predict size wrong, and your cost, schedule and risk estimates will all be wrong.

Three classic measures dominate the IPU syllabus:

MeasureYearAuthorBest For
LOC (Lines of Code)1960s(Folk practice)Quick estimates when language is known
Function Points (FP)1979Allan Albrecht (IBM)Language-independent, user-perspective
Halstead Software Science1977Maurice HalsteadAcademic; rarely used industrially

---

1. Lines of Code (LOC)

Definition: the number of lines in the program source code, excluding blank lines and comments (by convention — though this varies).

Variants

  • SLOC = Source LOC = lines of executable code
  • LLOC = Logical LOC = number of statements (1 statement may span multiple physical lines)
  • KLOC = thousand LOC
  • DSI = Delivered Source Instructions (used in COCOMO)

Estimation steps

  1. Decompose the system into modules
  2. Estimate LOC for each module using historical data
  3. Sum module LOC + 10–20% integration overhead

Example

A library system has 5 modules:

ModuleEstimated LOC
Member management800
Book catalogue1,200
Issue / return1,500
Reports700
Admin600
Sub-total4,800
+ 15% integration overhead720
Total estimate5,520 LOC ≈ 5.5 KLOC

Advantages of LOC

  • Simple to compute (automated tools count it)
  • Universally understood
  • Useful for productivity benchmarks within a stable team

Disadvantages of LOC

  • Language-dependent — a feature is far shorter in Python than in COBOL
  • Penalises good design — a verbose 1,000-line solution looks more productive than a tight 100-line one
  • Hard to estimate before code exists
  • Cannot compare across languages or teams with different style

---

2. Function Points (Albrecht, 1979)

Function Point Analysis measures size from the user's perspective, independent of programming language. It counts five types of user-visible elements:

Element TypeSymbolDescriptionExample
External InputsEIData input to the systemLogin form, add product
External OutputsEOOutputs to the userInvoice, monthly report
External InquiriesEQRead-only outputsStock-balance query
Internal Logical FilesILFInternal data storesCustomer table, order table
External Interface FilesEIFData shared with other systemsPayment-gateway feed

Step-by-step FP calculation

Step 1: Count each element type for the proposed system.

Step 2: Classify each as Simple / Average / Complex and apply weights:

ElementSimpleAverageComplex
External Inputs (EI)346
External Outputs (EO)457
External Inquiries (EQ)346
Internal Logical Files (ILF)71015
External Interface Files (EIF)5710

Step 3: Compute Unadjusted Function Points (UFP):

UFP = Σ (count × weight)

Step 4: Compute Complexity Adjustment Factor (CAF): Rate 14 General System Characteristics (GSCs) on a scale 0 (no influence) to 5 (essential):

  1. Data communications
  2. Distributed data processing
  3. Performance
  4. Heavily used configuration
  5. Transaction rate
  6. Online data entry
  7. End-user efficiency
  8. Online update
  9. Complex processing
  10. Reusability
  11. Installation ease
  12. Operational ease
  13. Multiple sites
  14. Facilitate change

Sum the 14 ratings = Total Degree of Influence (TDI) (0 to 70).

CAF = 0.65 + 0.01 × TDI

So CAF ranges from 0.65 (low complexity) to 1.35 (high).

Step 5: Adjusted Function Points (FP):

FP = UFP × CAF

---

Worked Example — Library System

Counts:

  • EI: 4 (Login, Add Book, Issue Book, Return Book) — all average
  • EO: 3 (Receipt, Overdue list, Monthly report) — average
  • EQ: 2 (Search book, View member) — simple
  • ILF: 3 (Member file, Book file, Loan file) — average
  • EIF: 1 (Library Federation feed) — simple

UFP calculation:

ElementCountWeightSub-total
EI (avg)4416
EO (avg)3515
EQ (simple)236
ILF (avg)31030
EIF (simple)155
UFP72

Assume TDI = 35 (typical mid-complexity system, average rating ≈ 2.5 per GSC).

CAF = 0.65 + 0.01 × 35 = 0.65 + 0.35 = 1.00
FP  = 72 × 1.00 = 72

So the library system is 72 Function Points.

---

Converting FP → LOC

You can convert FP to LOC using language-specific multipliers (Capers Jones backfiring tables):

LanguageLOC per FP (approx)
Assembler320
C128
C++55
Java53
Python27
SQL13
Visual Basic32

So 72 FP in Java ≈ 72 × 53 = 3,816 LOC ≈ 3.8 KLOC.

---

Advantages of FP

  • Language-independent — can be estimated before language is chosen
  • Counted before code exists — usable for early planning
  • User-perspective — counts business-relevant things
  • Industry-standard — ISO/IEC 20926

Disadvantages of FP

  • Subjective weights and complexity ratings
  • Requires training (Certified Function Point Specialist exists)
  • More effort than LOC counting
  • Doesn't capture algorithmic complexity

---

3. Halstead's Software Science Measures (1977)

Maurice Halstead modelled software as a function of operators and operands.

SymbolMeaning
n₁Number of distinct operators (e.g. +, if, while, function names)
n₂Number of distinct operands (variables, constants)
N₁Total occurrences of operators
N₂Total occurrences of operands

The four Halstead measures

Program vocabulary:    n = n₁ + n₂
Program length:        N = N₁ + N₂
Program volume:        V = N × log₂(n)
Program difficulty:    D = (n₁ / 2) × (N₂ / n₂)
Programming effort:    E = D × V
Estimated time (sec):  T = E / 18      (Stroud number ≈ 18)
Estimated bugs:        B = V / 3000

Worked example

Consider the C statement:

sum = a + b * c;

Operators: =, +, *, ; → distinct = 4, total = 4 → n₁ = 4, N₁ = 4 Operands: sum, a, b, c → distinct = 4, total = 4 → n₂ = 4, N₂ = 4

n  = 4 + 4 = 8
N  = 4 + 4 = 8
V  = 8 × log₂(8) = 8 × 3 = 24
D  = (4/2) × (4/4) = 2 × 1 = 2
E  = 2 × 24 = 48
T  = 48 / 18 ≈ 2.67 seconds
B  = 24 / 3000 ≈ 0.008

Advantages

  • Mathematically rigorous
  • Computable from source code automatically
  • Predicts bugs reasonably well

Disadvantages

  • Defining "operator" and "operand" varies by language
  • Less intuitive than LOC or FP
  • Rarely used in industry today
  • Only useful after code exists

---

Comparison — quick exam table

PropertyLOCFunction PointsHalstead
Pre-code estimate possible?HardYesNo
Language-independent?NoYesNo
User-perspective?NoYesNo
Effort to computeTrivialModerateEasy (after code)
Industry use todayHighModerateLow
ISO standard?NoYes (IFPUG)No

---

Key Terms — Lesson 2.2

The terms below are the vocabulary of size estimation — every numerical PYQ on FP or Halstead requires fluent use of them.

Size Estimation — Predicting how big a piece of software will be before it is built, so that cost, effort, and schedule can be derived. Size estimation is the first input to almost every cost model (COCOMO, Use Case Points, Story Points). Wrong size → wrong cost → wrong project plan.

SLOC / LLOC / DSI / KLOC — Variants of "lines of code." SLOC (Source LOC) counts physical or logical executable lines. LLOC (Logical LOC) counts language statements regardless of how many physical lines they span. DSI (Delivered Source Instructions) is the COCOMO-specific count of delivered executable statements. KLOC is 1,000 LOC.

Function Point (FP) — Allan Albrecht's 1979 size metric that counts the business functionality a system delivers, weighted by complexity, independent of programming language. The IFPUG (International Function Point Users Group) maintains the standard counting practice (ISO/IEC 20926).

External Input (EI) — In Function Point counting, an input transaction that crosses the system boundary and updates internal data — e.g., a "create order" form, a "register user" form. Weighted 3 (simple), 4 (average), or 6 (complex) FPs.

External Output (EO) — An output transaction that crosses the system boundary and contains derived data — a generated invoice, a monthly report, a calculated commission statement. Weighted 4, 5, or 7 FPs.

External Inquiry (EQ) — A read-only output transaction — input parameter in, retrieved data out, no derivation, no state change. A stock-balance query or "view my profile" page. Weighted 3, 4, or 6 FPs.

Internal Logical File (ILF) — A logical group of internally maintained data — a Customer table, an Order table. Weighted 7, 10, or 15 FPs depending on the number of record types and data elements.

External Interface File (EIF) — A logical group of data referenced by the system but maintained by another system — a payment gateway's product catalog feed, an external authentication provider's user list. Weighted 5, 7, or 10 FPs.

Unadjusted Function Points (UFP) — The raw sum of (count × weight) across all five element types, before adjustment for system characteristics. UFP captures the pure functional size.

General System Characteristics (GSCs) / Total Degree of Influence (TDI) — Fourteen factors (data communications, distributed processing, performance, complex processing, reusability, etc.) each rated 0 (no influence) to 5 (essential). The sum is the TDI (range 0–70) that drives the complexity adjustment.

Complexity Adjustment Factor (CAF) / Value Adjustment Factor (VAF) — A multiplier computed as CAF = 0.65 + 0.01 × TDI, ranging from 0.65 (low complexity) to 1.35 (high complexity). Modifies UFP to get the final Adjusted Function Points.

Adjusted Function Points (FP) — The final FP count after applying CAF: FP = UFP × CAF. This is the number used as input to cost-estimation models like COCOMO II.

Backfiring — Converting FP to LOC using language-specific multipliers from Capers Jones backfiring tables (e.g., ~53 LOC/FP for Java, ~27 LOC/FP for Python, ~13 LOC/FP for SQL). Useful for cross-comparison between FP-based estimates and LOC-based historical data; accuracy is approximate (±30%).

Capers Jones — American software-metrics researcher whose backfiring tables and benchmark databases (covering thousands of projects across 700+ programming languages) are the standard cross-reference between FP and LOC sizes.

Halstead's Software Science — Maurice Halstead's 1977 framework computing vocabulary (n), length (N), volume (V = N · log₂ n), difficulty (D = (n₁/2) × (N₂/n₂)), effort (E = D × V), time (T = E/18 seconds), and estimated bugs (B = V/3000) from the count of operators and operands in the source.

Operator (Halstead) — Any symbol or keyword that performs an action — arithmetic operators (+, *), comparison operators (==, <), logical operators (&&, ||), assignment (=), control flow (if, while, for, return), function names, brackets, semicolons. The counting convention varies by language, which is one of Halstead's weaknesses.

Operand (Halstead) — Any variable, constant, literal, or identifier that is acted upon by an operator. sum, x, 5, "hello" are all operands.

Use Case Points (UCP) — A 1993 extension of FP for object-oriented systems, counting use cases and actors weighted by complexity and adjusted by technical and environmental factors. Less standardised than FP but better matched to modern OO design.

Object Points / Application Points — A variant size metric used in COCOMO II's "Application Composition" mode — counts screens, reports, and 3GL components instead of LOC or FP. Suited to 4GL and low-code application development.

Story Points — The Agile abandonment of precise size measurement in favour of relative sizing by team consensus, often on a Fibonacci scale (1, 2, 3, 5, 8, 13, 21). Story points combine effort, complexity, and risk into a single team-specific unit and are calibrated by tracking velocity (story points completed per sprint).

ISO/IEC 20926 (IFPUG-FP) — The ISO standard for the IFPUG Function Point counting method. Two related ISO standards cover competing FP variants: ISO/IEC 24570 (Nesma) and ISO/IEC 19761 (COSMIC FP) — the latter is increasingly used for real-time and embedded systems.

---

Study deep

  1. Use FP for early planning, LOC for tracking. Function Points are best when estimating before code exists; LOC is best for productivity tracking once development begins.
  1. Backfiring is a useful shortcut but loses precision. Converting FP → LOC via Capers Jones tables works for ballpark figures but can be off by ±30% for individual projects.
  1. Halstead matters historically. The Halstead measures heavily influenced the design of modern maintainability indices and complexity metrics. The IPU exam likes the basic formulas — memorise V, D, E.
  1. Modern alternatives. Use Case Points (UCP) extends FP to OO systems. Object Points (OP) adapt FP to client-server systems. Story Points (Agile) abandon precise measurement in favour of relative sizing.
PYQ pattern (very common, numerical): "Compute Function Points for a system with EI=5(avg), EO=4(simple), EQ=3(complex), ILF=2(avg), EIF=1(simple), TDI=30." — UFP = 5×4 + 4×4 + 3×6 + 2×10 + 1×5 = 79; CAF = 0.65 + 0.30 = 0.95; FP = 79 × 0.95 ≈ 75.
PYQ pattern: "Given a C function, compute Halstead's Volume, Difficulty and Effort." — Identify n₁, n₂, N₁, N₂; apply the four formulas.