4.4 Manual vs Automation Testing, V&V & Testing vs Debugging
Manual vs Automation Testing
| Aspect | Manual Testing | Automation Testing |
|---|---|---|
| Performed by | Human tester | Scripted code via tools |
| Initial cost | Low | High (writing scripts) |
| Per-run cost | High (human time) | Near zero |
| Speed | Slow | Fast |
| Repeatability | Inconsistent | Consistent |
| Best for | UI, usability, exploratory, one-off | Regression, load, smoke |
| Skill required | Domain + testing | Programming + tool |
| Returns over time | Linear | Compounding |
| Reliability | Depends on tester | Depends on script |
When to automate
✓ Tests that run repeatedly (regression) ✓ Repetitive, mechanical checks ✓ Tests that require precision (timing, large data) ✓ Tests that need consistency across runs
When NOT to automate
✗ Tests that run once or twice ✗ UI that changes frequently ✗ Usability and exploratory testing ✗ Tests that need human judgment
Industry rule of thumb: Automate ~70–80% of regression tests; keep ~20–30% manual (especially exploratory, UI, and complex business scenarios).
---
Popular automation tools (by domain)
| Domain | Tools |
|---|---|
| Unit testing | JUnit, NUnit, pytest, Jest, RSpec |
| Integration / API | Postman, REST Assured, SoapUI |
| Web UI | Selenium, Playwright, Cypress, TestCafe |
| Mobile | Appium, Espresso (Android), XCUITest (iOS) |
| Performance / Load | JMeter, Gatling, LoadRunner, k6 |
| Security | OWASP ZAP, Burp Suite, Nikto |
| Behaviour-Driven (BDD) | Cucumber, SpecFlow, Behave |
| Continuous Integration | Jenkins, GitHub Actions, GitLab CI, CircleCI |
---
Validation vs Verification — recap with examples
| Verification | Validation | |
|---|---|---|
| Question | "Are we building it right?" | "Are we building the right thing?" |
| Focus | Process, spec compliance | Product, user need |
| Activities | Reviews, walkthroughs, inspections, static analysis | Dynamic testing, demos, UAT |
| Performed by | Team, internal SQA | Customer / end user |
| Timing | Throughout SDLC | Mostly after development |
| Example | Checking that the SRS-listed login flow is in the design | Asking real user to log in and confirm it works for them |
---
Alpha and Beta Testing
| Type | Where | Who | When |
|---|---|---|---|
| Alpha | At developer site | Internal team + select customers | Before beta release |
| Beta | At customer site, real env | Volunteer external users | Before general release |
Famous beta programs: Gmail (5 years in "beta" 2004–2009), Windows Insider, iOS Public Beta.
---
Acceptance Testing
| Type | Purpose |
|---|---|
| User Acceptance Testing (UAT) | Customer formally accepts/rejects the system |
| Operational Acceptance (OAT) | IT operations confirms deployability |
| Contract Acceptance | Verifies contractual obligations met |
| Regulation Acceptance | Meets regulatory requirements (FDA, RBI) |
UAT typically uses real production-like data and real user scenarios.
---
Functional vs Structural Testing
| Aspect | Functional | Structural |
|---|---|---|
| Synonyms | Black-box, behavioural | White-box, glass-box |
| Test basis | Specification | Code structure |
| Goal | "Does it do what it should?" | "Does every path execute correctly?" |
| Performed by | Anyone | Programmer |
| Examples | EP, BVA, decision tables | Statement, branch, path coverage |
---
Testing vs Debugging — the IPU favourite distinction
| Aspect | Testing | Debugging |
|---|---|---|
| Purpose | Find that defects exist | Locate and fix defects |
| Trigger | Planned activity | Failed test result |
| Mindset | Systematic, broad | Focused, deep |
| Output | Defect reports | Code changes |
| Done by | Tester (often) | Developer |
| Stage | After coding | After testing finds a failure |
| Approach | Plan-driven | Investigative |
| Skill | Test design, domain | Code understanding, deduction |
In one sentence: Testing reveals defects; debugging removes them.
Debugging techniques
| Technique | Description |
|---|---|
| Brute force | Print statements, dump memory; tedious but works |
| Backtracking | Start from failure, trace back through code paths |
| Cause elimination | Hypothesise cause; test the hypothesis |
| Binary search / Bisection | Disable half the code; narrow down |
| Rubber-duck debugging | Explain code line-by-line to a duck; defect reveals itself |
| Logging / Tracing | Permanent print statements with levels |
| Interactive debugger | Breakpoints, step-through, watch variables |
The debugging process
- Reproduce the failure reliably
- Locate the defective code (with techniques above)
- Fix the defect
- Test that the fix works AND that nothing else broke
- Document in the issue tracker
---
Common debugging pitfalls
| Pitfall | Antidote |
|---|---|
| Treating symptom not cause | Trace back to root cause |
| Fixing in wrong place | Reproduce before changing code |
| Introducing new bugs | Add regression test for the fix |
| Not understanding the fix | Don't ship code you don't understand |
| Skipping the documentation | Future you will not remember |
---
Test-Driven Development (TDD)
A practice that flips the test-after-code order:
1. Write a failing test (RED)
2. Write minimum code to pass (GREEN)
3. Refactor (REFACTOR)
4. Repeat
Benefits: tests exist by definition; design is testable; refactoring is safe; bugs are caught immediately. TDD originated with XP and is now widely practised.
---
Behaviour-Driven Development (BDD)
Extends TDD with natural-language scenarios:
Feature: Login
Scenario: Successful login
Given a registered user with email "a@example.com"
When she submits the login form with correct password
Then she should be redirected to her dashboard
Tools (Cucumber, SpecFlow) translate these scenarios into runnable tests. The benefit: the test is readable by non-programmers (BAs, product owners).
---
Performance, Load and Stress Testing
| Type | Definition | Tool |
|---|---|---|
| Performance | Response time, throughput at normal load | JMeter |
| Load | System under expected peak load | JMeter, Gatling |
| Stress | System beyond expected limits — find breakpoint | k6, Gatling |
| Endurance / Soak | Long duration to detect memory leaks | JMeter |
| Spike | Sudden burst of load | JMeter |
| Scalability | Behaviour as load grows | Mix of above |
---
Security Testing
| Activity | Purpose |
|---|---|
| Vulnerability scanning | Find known issues (CVE database) |
| Penetration testing | Simulate attacker; manual exploration |
| Static Application Security Testing (SAST) | Source-code scanning |
| Dynamic Application Security Testing (DAST) | Runtime testing |
| Fuzzing | Random inputs to find crashes |
Industry standard: OWASP Top 10 (Injection, Broken Auth, Sensitive Data Exposure, XXE, Broken Access Control, Security Misconfig, XSS, Insecure Deserialisation, Components with Known Vulns, Insufficient Logging).
---
Key Terms — Lesson 4.4
The vocabulary below covers manual vs automation testing, V&V, and the testing-versus-debugging distinction that PYQs love.
Manual Testing — Testing performed by a human tester following test cases or exploring the system intuitively. Low setup cost, high per-run cost, slow but flexible. Best for usability, exploratory, ad-hoc, and one-off testing where human judgement and creativity matter.
Automation Testing — Testing performed by scripts that execute test cases mechanically using a tool. High initial cost (write scripts) but near-zero per-run cost. Best for regression, load, smoke, and high-repetition scenarios where consistency and speed matter.
Selenium — The dominant open-source browser-automation framework for web UI testing. Selenium WebDriver lets scripts drive Chrome, Firefox, Edge, Safari as if they were a real user — clicking, typing, waiting, asserting.
Playwright / Cypress / TestCafe — Modern alternatives to Selenium for web UI testing. Playwright (Microsoft) and Cypress offer faster, more reliable execution with built-in waiting, network mocking, and rich debugging tooling. Cypress is browser-internal (runs inside the browser); Playwright spans Chromium, WebKit, and Firefox.
Appium / Espresso / XCUITest — Mobile UI automation tools. Appium is cross-platform (iOS + Android) using WebDriver protocol. Espresso is Android-native (Google). XCUITest is iOS-native (Apple).
JMeter / Gatling / k6 / LoadRunner — Performance and load testing tools. JMeter is the classical open-source Java-based tool. Gatling uses a Scala DSL. k6 uses JavaScript and is cloud-native. LoadRunner (Micro Focus) is the enterprise commercial leader.
Postman / REST Assured / SoapUI — API testing tools. Postman is the dominant GUI tool for designing and running API tests. REST Assured is a Java DSL for assertion-rich API tests in code. SoapUI specialises in SOAP and complex web-service workflows.
OWASP ZAP / Burp Suite — Security testing tools. OWASP ZAP is the open-source web-application security scanner. Burp Suite is the commercial industry standard for penetration testing.
Cucumber / SpecFlow / Behave — Behaviour-Driven Development (BDD) frameworks. They let test cases be written in Gherkin (Given/When/Then) syntax that non-programmers can read; the framework binds each phrase to executable code.
Test-Driven Development (TDD) — Kent Beck's discipline of writing a failing test first, then writing the minimum code to make it pass, then refactoring — the red-green-refactor cycle. TDD produces tightly-tested code and pressures designs to be simple.
Behaviour-Driven Development (BDD) — A practice that extends TDD with natural-language scenarios in Given/When/Then form, readable by non-programmers (BAs, product owners). The Gherkin syntax used by Cucumber is the canonical BDD example.
Gherkin — The plain-text language used in BDD scenarios. A typical scenario has a Feature line, a Scenario name, and Given/When/Then steps that the BDD framework binds to executable code.
Continuous Testing — The practice of running automated tests at every stage of the CI/CD pipeline — commit, build, deploy, and even in production. Tests gate every deployment. Mature SaaS teams run hundreds of thousands of test cases on every change.
Shift-Left Testing — The industry movement to test earlier in the SDLC — testable requirements (review the SRS), testable design (architectural reviews), TDD during coding. Each shift left reduces defect cost, per Boehm's curve.
Validation (recap) — "Are we building the right product?" — confirming the product meets the user's actual needs. Validated by demos, acceptance testing, beta testing, UAT.
Verification (recap) — "Are we building the product right?" — confirming each artefact conforms to its specification. Verified by reviews, inspections, walkthroughs, static analysis.
Alpha Testing — User-style testing performed at the developer's site by select internal users or invited customers, in a controlled environment. Catches usability and integration issues before the product faces the wild.
Beta Testing — User-style testing performed at user sites in real environments by volunteer external users. Validates compatibility, scalability, real-world usage patterns. Modern equivalents: Apple Public Beta, Windows Insider, Google Pixel Feature Drops.
UAT (User Acceptance Testing) — The formal customer-led test that determines whether the system meets agreed requirements and is ready for production. UAT typically uses production-like data and real user scenarios.
OAT (Operational Acceptance Testing) — The IT-operations counterpart of UAT — verifying the system can be deployed, monitored, backed up, restored, scaled, and operated in production. Often required by enterprise customers' IT functions.
Smoke Test — A quick build-acceptance test that confirms the most critical paths work — login, search, checkout, key API endpoints. Smoke tests are run first; if they fail, no further testing is worth doing on that build.
Sanity Test — A quick narrow regression after a small change — does the affected area still work? Sanity tests are between smoke (very broad and shallow) and full regression (everything).
Regression Test — A test that re-runs after a change to confirm previously-working behaviour has not broken. Automated regression suites are the safety net that enables continuous delivery. The regression suite grows over time and is the single largest body of automated tests in most products.
Testing vs Debugging — Two distinct activities. Testing reveals that defects exist — it is planned, systematic, broad. Debugging locates and removes defects — it is reactive, focused, deep. "Testing reveals defects; debugging removes them."
Debugging Techniques — Brute force (print statements, memory dumps). Backtracking (start from the failure, trace backward). Cause elimination (hypothesise → test → confirm or rule out). Binary search / Bisection (disable half the code, narrow down). Rubber-duck debugging (explain code aloud — defects often reveal themselves). Interactive debugger (breakpoints, step-through, watches). Logging / tracing (permanent log statements at levels DEBUG/INFO/WARN/ERROR).
Rubber-Duck Debugging — A debugging technique where the developer explains the code line-by-line to an inanimate object (a rubber duck). The act of putting the logic into words frequently reveals the defect — without the duck answering anything.
Bisection / Git Bisect — A debugging technique for finding which commit introduced a regression. git bisect does a binary search through commit history — at each step, the developer runs the test and marks the commit as good or bad; Git narrows down to the offending commit in log₂(N) steps.
Hot Fix / Patch — A small, targeted code change that fixes a critical production bug urgently. Hot fixes bypass the normal release process because the cost of waiting is too high. Best practice: every hot fix gets a regression test added so the bug cannot return silently.
Defect Density / DRE (recap) — Defect density is the number of defects per KLOC; Defect Removal Efficiency (DRE) = defects-found-before-release / total-defects-found. Best-in-class teams target DRE > 95%.
Continuous Integration (CI) — Automated build and test of every commit. Mandatory infrastructure for any modern test-automation strategy. Tools: GitHub Actions, GitLab CI, Jenkins, CircleCI, Azure Pipelines.
Test Pyramid (recap) — Mike Cohn's model for distributing automated tests: many unit tests at the base, fewer integration tests in the middle, few UI/E2E tests at the top. Inverting the pyramid (many slow E2E tests, few unit tests) is the classic anti-pattern.
AI-Assisted Testing — Emerging tools (Mabl, Functionize, Diffblue Cover, Testim, TestRigor) that use machine learning to generate test cases, prioritise tests likely to find new bugs, and self-heal scripts when UI changes. Useful as augmentation; not a replacement for thoughtful test design.
---
Study deep
- Test automation has a maintenance cost. Automated test suites need ongoing maintenance — UI changes, API changes, deprecated dependencies. Budget 20–30% of test maintenance effort.
- Testing and debugging require different mindsets. A good tester is adversarial — trying to break the software. A good debugger is analytical — patiently tracing cause and effect. Many developers excel at one and struggle at the other.
- Continuous Testing is the modern norm. In a CI/CD pipeline, every commit triggers automated tests. Failed tests block deployment. Mature teams have hundreds of thousands of test cases running on every change.
- AI-assisted testing is emerging. Tools generate test cases, prioritise tests likely to find new bugs, and even self-heal scripts when UI changes. Examples: Mabl, Functionize, Diffblue Cover. Useful but not a replacement for thoughtful test design.
- Shift-left testing. The industry movement to test earlier — even at the requirements stage (testable requirements), during design (testable design), and during coding (TDD). Each shift left reduces defect cost (Boehm's curve again).
PYQ pattern: "Differentiate testing and debugging." — Define both with one-sentence framing; table the 7 aspects (purpose, trigger, mindset, output, done by, stage, approach); mention common debugging techniques.
PYQ pattern: "Differentiate manual and automation testing. List the popular automation tools." — Table comparison; mention Selenium, JUnit, JMeter, Postman, Cucumber.