2.3 Cost Estimation — COCOMO
COCOMO (COnstructive COst MOdel) is Barry Boehm's empirical model for estimating software effort and schedule from size. Published in 1981 in Software Engineering Economics, it is the most widely taught cost-estimation model and a guaranteed IPU exam topic.
COCOMO comes in three flavours of increasing sophistication:
| Model | Inputs | Accuracy |
|---|---|---|
| Basic COCOMO | KLOC + project class | Rough (±50%) |
| Intermediate COCOMO | KLOC + class + 15 cost drivers | Better (±20%) |
| Detailed (Complete) COCOMO | All of the above + phase-wise breakdown | Best (±15%) |
---
Step 0 — Classify the project
Every COCOMO calculation starts by placing the project in one of three classes:
| Class | Size | Constraints | Team Experience | Example |
|---|---|---|---|---|
| Organic | Small (< 50 KLOC) | Loose, flexible | Highly experienced with similar work | Small business application, library system |
| Semi-detached | Medium (50–300 KLOC) | Mixed | Mixed experience | Compiler, ERP module |
| Embedded | Large (> 300 KLOC) | Tight constraints (hardware/regulatory) | Often inexperienced with this domain | Real-time OS, avionics, ATM software |
---
Basic COCOMO
The basic model uses two equations:
Effort (E) = a × (KLOC)^b [person-months]
Duration (D) = c × (E)^d [months]
The constants a, b, c, d depend on the project class:
| Class | a | b | c | d |
|---|---|---|---|---|
| Organic | 2.4 | 1.05 | 2.5 | 0.38 |
| Semi-detached | 3.0 | 1.12 | 2.5 | 0.35 |
| Embedded | 3.6 | 1.20 | 2.5 | 0.32 |
The constants come from Boehm's analysis of 63 real projects at TRW in the 1970s.
Worked example — Library System (Organic, 5.5 KLOC)
E = 2.4 × (5.5)^1.05
= 2.4 × 5.85
= 14.0 person-months
D = 2.5 × (14.0)^0.38
= 2.5 × 2.78
= 6.95 months ≈ 7 months
Average staff = E / D = 14 / 7 = 2 people
So the library system needs ~14 person-months over ~7 months with ~2 people.
Worked example — Embedded ATM software (50 KLOC)
E = 3.6 × (50)^1.20
= 3.6 × 109.6
= 394.5 person-months
D = 2.5 × (394.5)^0.32
= 2.5 × 6.74
= 16.85 months ≈ 17 months
Average staff = 394.5 / 17 ≈ 23 people
A 50-KLOC embedded project needs almost 400 person-months — far more than a 50-KLOC organic project would. The higher exponent (1.20 vs 1.05) reflects the diseconomy of scale in tightly constrained projects.
Cost in money terms
If the loaded cost per person-month (salary + overhead) is ₹2,00,000:
Library cost = 14 × 2,00,000 = ₹28,00,000 (~₹28 lakh)
ATM cost = 394 × 2,00,000 = ₹7,88,00,000 (~₹7.88 crore)
---
Intermediate COCOMO
The intermediate model multiplies basic effort by 15 Effort Adjustment Factors (EAFs), also called cost drivers. Each driver has a rating from "Very Low" to "Extra High."
The 15 cost drivers (Boehm)
Product attributes:
- RELY — Required software reliability
- DATA — Database size
- CPLX — Product complexity
Hardware attributes:
- TIME — Execution time constraint
- STOR — Main storage constraint
- VIRT — Virtual machine volatility
- TURN — Computer turnaround time
Personnel attributes:
- ACAP — Analyst capability
- AEXP — Application experience
- PCAP — Programmer capability
- VEXP — Virtual machine experience
- LEXP — Programming language experience
Project attributes:
- MODP — Modern programming practices
- TOOL — Use of software tools
- SCED — Required development schedule
Effort Adjustment Factor (EAF)
For each driver, look up a multiplier (range typically 0.7 to 1.66). Multiply all 15:
EAF = M₁ × M₂ × ... × M₁₅
Effort then becomes:
E = a × (KLOC)^b × EAF
The intermediate-model constants:
| Class | a | b |
|---|---|---|
| Organic | 3.2 | 1.05 |
| Semi-detached | 3.0 | 1.12 |
| Embedded | 2.8 | 1.20 |
Quick EAF example
A 5.5-KLOC organic project with:
- High reliability (RELY = 1.15)
- Low programmer capability (PCAP = 1.17)
- Modern programming practices in use (MODP = 0.91)
- All other drivers nominal (= 1.00)
EAF = 1.15 × 1.17 × 0.91 × 1.00 × ... × 1.00
≈ 1.225
E = 3.2 × (5.5)^1.05 × 1.225
= 3.2 × 5.85 × 1.225
≈ 22.9 person-months
The same project that was 14 PM under basic COCOMO is now 22.9 PM — the cost drivers push effort up by ~64%.
---
Detailed (Complete) COCOMO
Adds two refinements:
- Phase-sensitive effort multipliers — different cost drivers affect different phases. For example, programmer capability matters more during coding than design.
- Three-level system hierarchy — system, sub-system, module. Cost drivers are applied at the most appropriate level.
This produces effort estimates per phase (Requirements: 6%, Design: 16%, Coding: 65%, Testing: 13% for organic, with variations).
---
COCOMO II (1995)
Boehm published an updated COCOMO II to handle modern software development:
- Three sub-models: Application Composition, Early Design, Post-Architecture
- Uses scale drivers (precedentedness, development flexibility, architecture/risk resolution, team cohesion, process maturity) that adjust the exponent dynamically
- Better suited to OO, COTS reuse, rapid development
COCOMO II is the modern industrial version. The IPU paper still teaches the original COCOMO (1981) but may mention II in 5-mark notes.
---
Putnam (SLIM) Model — quick mention
Larry Putnam's SLIM (Software Life-cycle Management) model uses the Rayleigh staffing curve to relate effort, time and size:
K = (Size / (P × T^(4/3)))^3
Where K = effort, P = productivity, T = development time. Putnam's central insight: trading time and effort is non-linear. You cannot compress schedule by ⅓ by hiring 50% more people — the relationship is roughly T^4.
---
Comparison — quick table
| Model | Year | Input | Accuracy | Used Today |
|---|---|---|---|---|
| Basic COCOMO | 1981 | KLOC + class | Rough | Teaching, quick estimates |
| Intermediate COCOMO | 1981 | + 15 drivers | Better | Industry (legacy) |
| Detailed COCOMO | 1981 | + phase splits | Best | Rare |
| COCOMO II | 1995 | Multiple sub-models | Industry standard | Yes |
| Putnam/SLIM | 1978 | Size + productivity | Good for large projects | Limited |
| Function Point + Productivity | various | FP × historical PM/FP | Simple | Common |
| Story Points (Agile) | 2000s | Relative sizing | Iterative refinement | Most modern teams |
---
Key Terms — Lesson 2.3
Cost estimation has its own dense vocabulary. The terms below appear in every numerical COCOMO PYQ and are also the bridge to Unit II's risk and planning topics.
COCOMO (Constructive Cost Model) — Barry Boehm's empirical cost-estimation model, published in his 1981 book Software Engineering Economics. COCOMO derives effort (person-months), duration (months), and staff from size (KLOC) and a project classification, calibrated on 63 real TRW projects of the 1970s.
Person-Month (PM) / Staff-Month — A unit of effort equal to one person working one month full-time (typically taken as 152 productive hours). Effort estimates in COCOMO are in person-months; 14 PM means the total work equals 14 people working for 1 month, or 1 person for 14 months, or any equivalent combination — bounded by Brooks's Law.
Effort vs Duration vs Staff — Three related but distinct quantities. Effort is total work in PM. Duration is elapsed calendar time in months. Average Staff = Effort ÷ Duration. Adding people lets you reduce duration, but only up to a point — Brooks's Law and Putnam's T⁴ relationship limit compression.
Organic Project (COCOMO) — A small, in-house project (typically < 50 KLOC) developed by a highly experienced team in a familiar environment with loose constraints — example: a library management system, a small business application. Smallest COCOMO constants (a=2.4, b=1.05 basic / 3.2, 1.05 intermediate).
Semi-detached Project (COCOMO) — A medium-sized project (50–300 KLOC) with mixed experience in the team and moderate constraints — example: a compiler, an ERP module, a mid-size web application. Middle constants (a=3.0, b=1.12).
Embedded Project (COCOMO) — A large project (> 300 KLOC) with tight hardware, regulatory, or schedule constraints, often unfamiliar territory for the team — example: real-time OS, avionics, ATM software. Largest constants (a=3.6, b=1.20). The higher exponent reflects diseconomies of scale in highly constrained projects.
Diseconomy of Scale (Software) — The empirical observation that doubling project size more than doubles the required effort in software. Because the COCOMO exponent b > 1.0 for all project classes (1.05 organic, 1.12 semi-detached, 1.20 embedded), effort grows super-linearly with KLOC.
Effort Equation (Basic COCOMO) — E = a × (KLOC)ᵇ person-months, where (a, b) is the constant pair for the project class. The equation defines a power-law relationship between size and effort that captures the diseconomy of scale.
Duration Equation (Basic COCOMO) — D = c × (E)ᵈ months, with c = 2.5 and d in (0.32, 0.35, 0.38) for embedded, semi-detached, organic respectively. The model's duration is the nominal calendar time if the project is staffed optimally — schedule compression beyond this point increases total effort.
Effort Adjustment Factor (EAF) / Cost Driver — In Intermediate COCOMO, a multiplier between roughly 0.7 (best) and 1.66 (worst) for each of 15 attributes (reliability requirement, database size, programmer capability, etc.). The product of all 15 is the EAF, which multiplies the basic effort estimate.
The 15 Cost Drivers (Boehm) — Grouped into four categories. Product attributes: RELY (required reliability), DATA (database size), CPLX (complexity). Hardware attributes: TIME (execution-time constraint), STOR (storage constraint), VIRT (VM volatility), TURN (turnaround time). Personnel attributes: ACAP (analyst capability), AEXP (application experience), PCAP (programmer capability), VEXP (VM experience), LEXP (language experience). Project attributes: MODP (modern practices), TOOL (tooling), SCED (required schedule).
Detailed (Complete) COCOMO — The third level of the original COCOMO, adding phase-sensitive effort multipliers (different cost drivers affect different phases) and a three-level system hierarchy (system, subsystem, module). Outputs effort broken down per phase of the lifecycle.
COCOMO II (1995) — Boehm's modernised COCOMO, with three sub-models for different lifecycle stages — Application Composition (4GL / prototyping), Early Design (architecture stage), and Post-Architecture (after detailed design). Replaces the rigid project classes with scale drivers that adjust the exponent dynamically.
Scale Drivers (COCOMO II) — Five factors that adjust the exponent in COCOMO II rather than the multiplier — PREC (precedentedness), FLEX (development flexibility), RESL (architecture / risk resolution), TEAM (team cohesion), and PMAT (process maturity, often tied to CMM level). Encodes the modern intuition that how mature the team is affects the shape of the cost curve, not just its level.
Putnam / SLIM Model — Larry Putnam's 1978 cost-estimation model based on the Rayleigh staffing curve observed in real projects. The central equation Size = C × (Effort)^(1/3) × (Time)^(4/3) encodes Putnam's most important insight: the trade-off between time and effort is strongly non-linear — compressing the schedule by 25% can roughly double the effort.
Rayleigh Curve — The bell-shaped staffing curve Putnam observed in many real projects — ramps up slowly, peaks around 40–50% of the schedule, then tails off. The Rayleigh curve underlies Putnam's SLIM and is the empirical evidence against "level staffing across the whole project."
Cone of Uncertainty — Steve McConnell's term for the empirical observation that early estimates have very wide error bars (±100%) that narrow as the project progresses. At requirements stage, ±100%; after high-level design, ±50%; after detailed design, ±25%; only near release does ±10% become feasible. Useful for setting honest expectations.
The 90-90 Rule (Tom Cargill, Bell Labs) — "The first 90% of the code accounts for the first 90% of development time. The remaining 10% of code accounts for the other 90% of development time." The classic cynical observation that schedule estimates are systematically over-optimistic about the tail end.
Local Calibration — Adjusting COCOMO constants (a, b) and cost-driver multipliers to match your own organisation's historical project data, rather than using Boehm's 1970s TRW numbers verbatim. Modern industry use of COCOMO almost always involves local calibration; otherwise the model is just a rough sanity check.
Loaded Cost / Fully Loaded Cost per PM — The all-in cost of one person-month to the organisation — salary, benefits, taxes, equipment, office space, management overhead, training, software licences — typically 1.5–2.5× base salary. The number that converts a person-month estimate into rupees. Indian industry rule of thumb (2024): ₹1.5–4 lakh per PM for staff augmentation, ₹2.5–6 lakh per PM in-house, varying by seniority and city.
Standish CHAOS Report — The Standish Group's long-running survey of IT project outcomes, which has consistently found that only ~30% of projects are "successful" (on time, on budget, full scope), ~50% are "challenged," and ~20% fail outright. The headline data behind every "why software projects fail" lecture.
---
Study deep: estimation realities
- All estimates are wrong; some are useful. A 30-year industry rule: estimates done at requirements stage are accurate to ±100% (the "cone of uncertainty"). Only after detailed design does ±25% become feasible.
- The 90-90 rule (Tom Cargill). "The first 90% of the code accounts for the first 90% of development time. The remaining 10% of code accounts for the other 90% of development time." Translation: project budgets are systematically optimistic about the tail end.
- Local calibration matters. COCOMO constants come from 1970s American projects. Indian software companies often re-calibrate the constants from their own historical data — a, b shift, and the multipliers change with industry norms.
- Estimate ranges, not points. Instead of "14 PM," report "10–20 PM with 80% confidence." This forces honest conversation about risk and contingency.
- The Standish Group's 'Resolution' data. Across hundreds of studies, only ~30% of projects come in on time and on budget. The remaining ~70% are challenged or fail. COCOMO + experience helps; nothing solves the problem completely.
PYQ pattern (recurring numerical): "Compute effort and development time for a project of 50 KLOC using Basic COCOMO. Assume Organic / Semi-detached / Embedded." — Show the formula, substitute, give E and D for each class.
PYQ pattern: "What is COCOMO? Differentiate Basic, Intermediate and Detailed COCOMO models." — Define COCOMO, name Boehm (1981), then table the three models with their inputs and accuracy.