Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

3.4 Reverse Engineering, Re-engineering & Configuration Management

Lesson 18 of 24 in the free Software Engineering notes on Siksha Sarovar, written by Rohit Jangra.

3.4 Reverse Engineering, Re-engineering & Software Configuration Management

The legacy code problem

Most software work is not writing new code from scratch — it is understanding, modifying and extending existing code. Sommerville's data: ~60–80% of total software cost is spent on maintenance and enhancement of legacy systems.

When you inherit code with no documentation, written by someone who has long left, in a style you'd never choose — that's a legacy system. Three engineering activities help:

ActivityGoal
Reverse EngineeringUnderstand what the legacy code does
Re-engineeringImprove the legacy system without changing its function
Forward engineeringBuild new code from a (recovered or new) design

---

Reverse Engineering

Reverse engineering is the process of analysing an existing system to identify its components and their inter-relationships, producing a representation at a higher abstraction level.

   Source Code  ──► Design Documents  ──► Requirements
      (low)              (medium)              (high)

Goals of reverse engineering

  • Recover lost design documentation
  • Understand undocumented systems
  • Identify reusable components
  • Aid in re-engineering decisions
  • Locate defects or vulnerabilities

Reverse engineering activities

  1. Source code analysis — parse the code into AST
  2. Restructuring — improve internal structure without behaviour change
  3. Extraction — identify modules, data structures, control flow
  4. Abstraction — generate UML, ER, DFD from code
  5. Documentation generation — Doxygen / Javadoc-style automated docs

Tools

  • UML reverse engineering: Enterprise Architect, StarUML, Visual Paradigm
  • Decompilers: JD-GUI (Java), dotPeek (.NET), Ghidra (binaries)
  • Source-code analysers: SonarQube, NDepend, Structure101

Reverse engineering is not...

Reverse engineering is NOTWhy
Decompilation aloneDecompilation is a step, not the whole activity
Software piracyRE is legitimate for maintenance, security, interoperability
Always legalSome EULAs forbid it; check jurisdiction (DMCA, EU directives)

---

Software Re-engineering

Re-engineering is the examination and alteration of a system to reconstitute it in a new form with new features, while preserving its essential function.

Re-engineering activities

    Existing System
         │
         ▼
    [Reverse Engineering] ──► Higher-level model
         │
         ▼
    [Restructure / Refactor]
         │
         ▼
    [Forward Engineering] ──► Re-engineered System
  1. Inventory analysis — what do we have?
  2. Document restructuring — re-create missing documentation
  3. Reverse engineering — extract design
  4. Code restructuring — clean up the code (formatting, dead code removal, modularisation)
  5. Data restructuring — schema improvements, normalisation
  6. Forward engineering — build new modules using recovered design

When to re-engineer

Re-engineer if...Throw away if...
Business logic is still relevantRequirements have fundamentally changed
Hardware/platform is being replacedBetter commercial off-the-shelf exists
Maintenance cost > 50% of replacementCode is unsalvageable mess
Team has skills in current languageNo team understands it anymore
Famous re-engineering project: UK's National Air Traffic Services moved from a 1970s-era system to a re-engineered platform in the 2000s — they preserved business logic but rebuilt the technical foundation.

---

Reverse Engineering vs Re-engineering — comparison

AspectReverse EngineeringRe-engineering
OutputUnderstanding / documentationNew system
Changes code?NoYes
EffortSmallerLarger
RiskLowMedium-high
GoalComprehensionImprovement

---

Code Restructuring

A subset of re-engineering that improves internal structure without changing external behaviour. The modern term is refactoring (Martin Fowler, 1999).

Common refactorings

RefactoringWhat it does
Rename variableMake name reflect intent
Extract methodMove a code block to its own function
Extract classSplit a god-class into focused classes
Inline methodReverse — eliminate trivial wrappers
Move methodRelocate to where it belongs
Replace conditional with polymorphismUse OO instead of switch
Replace magic number with constantif (status == 3)if (status == APPROVED)
Remove dead codeDelete unreachable code

Rules of refactoring

  1. Refactor in small steps
  2. Run tests after each step
  3. Never change behaviour and refactor in the same step
  4. Commit refactorings separately from feature changes

---

Software Configuration Management (SCM)

SCM is the discipline of identifying, organising and controlling modifications to the software being built. Without SCM, large software projects collapse into chaos — multiple developers overwrite each other's work, releases are unreproducible, and bugs in production cannot be tied back to specific code versions.

Configuration items (SCIs)

Anything that can change during the project is a configuration item:

  • Source code files
  • Build scripts and Makefiles
  • Documentation (SRS, design docs, manuals)
  • Test cases and test data
  • Configuration files (.env, .yaml)
  • Third-party libraries (with versions)
  • Database schema and migrations
  • Issue tracker entries

---

The 5 SCM functions

1. Configuration Identification

  • Give each SCI a unique identifier
  • Establish a baseline — frozen version that becomes the reference
  • Common baselines: requirements baseline, design baseline, product baseline

2. Version Control

Track every change to every SCI with a history:

  • Version — a numbered state (1.0, 1.1, 1.2)
  • Revision — a change to a version
  • Variant — a parallel version for a different audience
  • Release — a version delivered to users

Tools: Git (Linus Torvalds, 2005), Mercurial, SVN, ClearCase, Perforce.

3. Change Control

Every modification follows a formal process:

  Change Request (CR)
        │
        ▼
   Change Control Board (CCB) review
        │
   ┌────┼────┐
   │    │    │
 Reject Hold Approve
              │
              ▼
        Implement change
              │
              ▼
        Test & verify
              │
              ▼
        Update baseline

4. Configuration Auditing

Periodic verification that:

  • All SCIs are properly identified
  • All changes are documented and approved
  • Baselines are consistent with reality

5. Status Reporting

Communicate to stakeholders:

  • Current state of each SCI
  • Pending change requests
  • Recent approved changes
  • Baseline history

---

Version Control with Git — modern essentials

ConceptDefinition
RepositoryStorage of all versions
CommitA snapshot with a message
BranchAn independent line of development
MergeCombine two branches
Pull request / Merge requestProposed change for review
TagNamed version (often a release)

Branching strategies:

  • Git Flow — develop, feature, release, hotfix branches (heavy)
  • GitHub Flow — main + feature branches (light)
  • Trunk-Based Development — one main branch, short-lived features

---

SCM Plan (SCMP) — typical contents (IEEE 828)

  1. Introduction
  2. SCM Management — roles and responsibilities
  3. SCM Activities — identification, change control, audits
  4. SCM Schedule — timing of audits and baselines
  5. SCM Resources — tools, hardware, training
  6. SCM Plan Maintenance — how the SCMP itself is updated

---

Key Terms — Lesson 3.4

The terms below define the vocabulary of legacy-system engineering and configuration management — every PYQ on RE/re-engineering or SCM expects them.

Legacy System — A software system that continues to deliver business value but uses outdated technology, lacks current documentation, or relies on people who have left the organisation. Most maintenance work in industry happens on legacy systems. Indian outsourcing firms built their early business on legacy mainframe migration.

Forward Engineering — The traditional development direction: requirements → design → code → executable. Forward engineering is what every SDLC model in Units I–III describes.

Reverse Engineering — Working in the opposite direction: from existing code or binaries back to a higher-level representation — design diagrams, requirements, or even alternative implementations. The goal is understanding, not modification. Output: documentation, UML diagrams, recovered specifications.

Re-engineeringReverse engineering followed by forward engineering — understand the legacy system, then rebuild it in a new form (new technology, new architecture, sometimes new language) while preserving its essential business function. Re-engineering is far cheaper than redesigning from scratch when the business logic is still relevant.

Decompiler / Disassembler — Tools that convert compiled binaries back to higher-level form. A disassembler converts machine code to assembly. A decompiler goes further — assembly back to a high-level language (often C-like). JD-GUI for Java, dotPeek for .NET, Ghidra (NSA-released) and IDA Pro for native binaries.

Restructuring — A subset of re-engineering that changes the internal organisation of code without changing its external behaviour — modularising, removing dead code, normalising formatting, splitting god-classes. The modern term is refactoring.

Refactoring — Martin Fowler's 1999 term (and book) for the discipline of restructuring code in small, behaviour-preserving steps. Common refactorings: Rename, Extract Method, Extract Class, Inline Method, Move Method, Replace Conditional with Polymorphism, Remove Dead Code. Refactoring requires a comprehensive test suite that guarantees behaviour is preserved.

Code Smell — Martin Fowler's term for a surface symptom that suggests a deeper design problem — long methods, duplicated code, large classes, long parameter lists, feature envy, switch statements, divergent change, shotgun surgery. Smells aren't bugs; they're indicators that refactoring would help.

Technical Debt — Ward Cunningham's metaphor for the future cost of suboptimal design decisions taken to meet a short-term need — shortcuts, hard-coded values, missing tests, outdated dependencies. Like financial debt, technical debt accrues interest in the form of increased maintenance cost and is repaid through refactoring.

God Class / God Object — An anti-pattern where one class accumulates too many responsibilities — typically a class with hundreds of methods and thousands of lines. The classical example is a "Util" class that grew over years. Refactoring usually involves extracting cohesive sub-classes.

Joel Spolsky's "Things You Should Never Do" — Spolsky's classic essay (2000) arguing that rewriting working software from scratch is almost always a disastrous strategic mistake. The Netscape 6 rewrite took 3 years and lost the browser market; the same lesson applies to many ambitious rewrites since.

Software Configuration Management (SCM) — The discipline of identifying, organising, and controlling modifications to the software being built. Without SCM, multi-developer projects collapse into chaos. SCM has five canonical functions: identification, version control, change control, configuration auditing, status reporting.

Configuration Item (SCI) — Any artefact under SCM control — source code files, build scripts, documentation (SRS, design, manuals), test cases, test data, config files, third-party library versions, database schema, migrations, even issue-tracker entries. The principle: anything that can change, and whose change matters, should be a CI.

Baseline — A formally reviewed and approved version of a configuration item that becomes the reference for subsequent work. Common baselines: requirements baseline (after SRS sign-off), design baseline (after SDD sign-off), product baseline (after first release).

Version Control System (VCS) — A tool that tracks every change to every file in a project, recording who, what, when, and why. Git (Linus Torvalds, 2005) is overwhelmingly dominant; Mercurial, SVN (Subversion), Perforce, and ClearCase still survive in specific niches.

Git — Linus Torvalds's 2005 distributed version control system, now the universal standard. Every developer's working copy is a complete repository, not just a checkout. Core concepts: commit (atomic change), branch (independent line of development), merge (combine branches), remote (a copy on another machine), pull request (proposed change for review).

Branch — An independent line of development in version control. Modern teams use feature branches (one per feature in development), a main/master branch (always working state), and sometimes release branches (stable cut for shipping).

Merge / Pull Request / Merge Request — Combining one branch's changes into another. In GitHub, this is a Pull Request (PR); in GitLab, a Merge Request (MR). PRs are the unit of code review in modern development.

Git Flow — A specific branching strategy (Vincent Driessen, 2010) with develop, feature, release, and hotfix branches in addition to main. Heavy but disciplined; common in older enterprise teams.

GitHub Flow / Trunk-Based Development — Lightweight alternatives to Git Flow. GitHub Flow has just main + short-lived feature branches. Trunk-Based Development goes further — every developer integrates to main multiple times per day, behind feature flags if needed. Both are preferred for high-velocity CI/CD environments.

Change Request (CR) — A formal request to modify a baselined configuration item. Each CR is logged, reviewed by the Change Control Board, and either approved, deferred, or rejected. The CR is the paper trail that prevents uncontrolled change.

Change Control Board (CCB) — The committee that reviews and decides on change requests affecting baselined items. CCB membership typically includes the project manager, technical lead, customer representative, and QA lead. The CCB exists to prevent scope creep and to maintain traceability.

Configuration Audit — A periodic independent verification that the actual state of the project matches the documented configuration — every SCI accounted for, every change traceable to an approved CR, every baseline consistent with reality. ISO 9001 and CMMI both require periodic configuration audits.

Status Accounting / Status Reporting — The SCM activity of communicating the current configuration state to stakeholders — what is in the baseline, what change requests are pending, what changes were approved recently. Reports are produced at agreed intervals (weekly, monthly) and at milestones.

IEEE 828 — The IEEE standard for Software Configuration Management Plans. Defines the recommended SCMP structure — introduction, SCM management, SCM activities, SCM schedule, SCM resources, plan maintenance.

Build Script / Build System — The script that compiles source code, runs tests, packages deliverables, and produces artefacts — Maven (Java), Gradle (Java/Android/Kotlin), npm/yarn (Node), pip/poetry (Python), Make (C/C++), Bazel (multi-language at scale). Build scripts are themselves SCIs.

Continuous Integration / Continuous Deployment (CI/CD) — The modern automation of build, test, and deploy. CI automatically builds and tests every commit. CD automatically deploys passing builds to staging (or production). CI/CD pipelines are themselves SCIs, defined in YAML files under version control.

Infrastructure-as-Code (IaC) — The modern DevOps practice of defining infrastructure (servers, networks, databases) in version-controlled text files — Terraform, AWS CloudFormation, Pulumi, Ansible, Kubernetes manifests. IaC brings SCM's discipline (review, baseline, audit, rollback) to infrastructure that was historically managed manually.

Tag / Release — A named, immutable pointer to a specific commit in Git, typically used to mark a released version (v1.0, v2.3.1). Tags are part of the SCM audit trail: "what was in production on date X" can be answered by checking out the tag.

Semantic Versioning (SemVer) — A convention for version numbers — MAJOR.MINOR.PATCH — where MAJOR bumps for breaking changes, MINOR for backward-compatible features, PATCH for bug fixes. The de-facto standard for libraries published to npm, PyPI, Maven Central.

---

Study deep

  1. Configuration management is invisible until it fails. When CM works, no one notices. When it fails (lost code, broken build, can't reproduce a release) the project grinds to a halt. Invest in CM early.
  1. Git is the universal tool. Despite its complexity, Git has won industry-wide. Modern developers must be fluent in branching, merging, rebasing, conflict resolution. Indian outsourcing companies often use Git + Bitbucket + JIRA as the standard stack.
  1. DevOps blurs the boundary. Modern DevOps treats everything as code — infrastructure (Terraform), pipelines (GitHub Actions), configuration (Ansible). All under SCM. The set of SCIs has grown dramatically.
  1. Reverse engineering for security is huge. Malware analysis, vulnerability research, and forensics are all forms of reverse engineering. Tools like Ghidra (NSA, open-sourced 2019) and IDA Pro are industry standards.
  1. Re-engineering is harder than rewriting. Rewriting from scratch is tempting but historically disastrous — Joel Spolsky's "Things You Should Never Do" essay (Netscape 6 took 3 years to rebuild and lost the market). Re-engineering preserves the working business logic that took years to perfect.
PYQ pattern: "Differentiate reverse engineering and re-engineering." — Define both; table the comparison (output, changes code?, effort, risk, goal); end with an example (NATS air-traffic system).
PYQ pattern: "What is Software Configuration Management? Explain its activities." — Define SCM, name configuration items, list the 5 functions (identification, version control, change control, audit, reporting); mention Git as the modern tool.