Testing, Mocks, Property-Based Tests, Mutation Testing
Session 41 of the 48-session learning series.
Date: Sat, 2026-07-11 · Time: 14:30–16:30 IST · Track: 🧱 OOP & Languages (OOP) · Parent 28-day topic: Day 20 · Est. read: 2 h
Why this session matters
This is Session 41 of 48 in the OOP track. Code without tests is a draft. Most engineers stop at "I wrote some unit tests and they pass" — and miss property-based tests, mutation testing, and the discipline of fast/slow tiers. The compounding returns on investment in test infrastructure are massive over a multi-year codebase.
Agenda
- The test pyramid — unit, integration, e2e; what each catches
- Mocking — stubs, mocks, fakes, spies; when each is right
- Property-based testing — Hypothesis, fast-check; the bug-finding superpower
- Mutation testing — measuring the quality of the tests themselves
- Test infrastructure — fixtures, parallelism, flakes, coverage thresholds
Pre-read (skim before the session)
- Martin Fowler — Test Pyramid
- Martin Fowler — Mocks Aren't Stubs
- Hypothesis docs — Quick start
- PIT mutation testing — Mutators
Deep dive
1. The pyramid (and the diamond, and the trophy)
▲
e2e slow, brittle, gives confidence at top
─────
integration real DB / queue / network
────────────
unit tests fast, deterministic, isolate one thing
Classic pyramid: unit-heavy, fewer integration, few e2e.
Modern variants:
- Trophy (Kent C. Dodds) — heavier on integration than pure pyramid.
- Diamond — light on unit, heavy on integration.
Right shape depends on your tech. Microservices = lean toward integration; tight monolith = lean toward unit.
2. What "unit" actually means
Two schools:
- Solitary unit — mock every collaborator. Test the function in isolation.
- Sociable unit — use real collaborators when cheap (in-process, no I/O). Mock only external boundaries.
The sociable school catches integration bugs earlier; solitary is faster and easier to debug.
Pragmatic default: sociable units with mocked external boundaries (HTTP, DB if remote, time, randomness).
3. Stubs vs mocks vs fakes vs spies
| Type | Behaviour | Verifies? |
|---|---|---|
| Stub | Returns hardcoded values | No |
| Mock | Expects specific calls; fails if unmet | Yes |
| Fake | Working implementation, simplified (in-memory DB) | No |
| Spy | Wraps real object; records calls | Yes |
Rule: prefer fakes > stubs > mocks > spies. Mocks couple you to implementation; refactoring breaks tests for no reason.
4. Test doubles per language
- Python:
unittest.mock,pytest-mock. For HTTP:responses,httpx_mock,pytest-httpserver. For time:freezegun. - Node/TS:
jest.mock(),sinon,nockfor HTTP. - Java: Mockito, WireMock for HTTP.
- Go: handcrafted interfaces + table-driven tests (idiomatic; few libs).
- Rust:
mockall,wiremock.
5. Property-based testing
Instead of assert add(2, 3) == 5, write for any a, b: assert add(a, b) == add(b, a). The framework generates 1000s of cases and shrinks failures to minimal examples.
from hypothesis import given, strategies as st
@given(st.lists(st.integers()))
def test_sort_idempotent(xs):
assert sorted(sorted(xs)) == sorted(xs)
Where it wins:
- Stateless pure functions (parsers, serialisers, hashers).
- Round-trip properties (
encode(decode(x)) == x). - Invariants of data structures (BST always sorted, heap property).
- Concurrency models (stateful machine testing).
Hours of property-based testing find bugs that years of example-based won't.
6. Mutation testing
A mutation test framework changes your code slightly (+ to -, > to \<) and re-runs your tests. If tests still pass → your test suite missed the mutation → suite is weaker than coverage suggests.
Tools: PIT (Java), mutmut (Python), Stryker (JS/TS/.NET).
Original: if x > 0:
Mutant: if x >= 0:
Result: kill rate (% of mutants caught). 80%+ = good; under 60% = your tests check that code runs, not that it's correct.
Run weekly, not per-commit (it's slow).
7. Fixtures and setup
Reuse expensive setup; avoid global state:
- pytest
@fixture(scope='module')for one-per-file. - pytest
@fixture(scope='session')for one-per-test-run. - Use Testcontainers for ephemeral DB / Kafka / Redis in CI.
Anti-pattern: shared mutable state across tests → flakes that only fail on rerun.
8. Parallelism and isolation
Run tests in parallel for speed:
pytest -n auto(withpytest-xdist).jest --maxWorkers.
Each parallel worker needs isolation:
- Random schema/DB per worker.
- Random ports.
- No shared temp files.
- No global mocks.
The discipline costs upfront; pays back every CI run.
9. Flaky tests
A flaky test is worse than no test — it teaches the team to ignore failures.
Common causes:
- Timing assumptions (
sleep(1)instead ofwait_until). - Ordering assumptions on un-ordered output.
- Network calls leaking to real services.
- Shared state leaking between tests.
Triage:
- Mark flaky → quarantine (
@pytest.mark.flakywith auto-rerun). - Fix within a sprint or delete the test.
- Track flake rate per test in CI history.
10. Coverage
A measurement, not a goal. 80% line coverage of "if x is None" branches is meaningless.
Useful coverage practices:
- Track per-PR delta — coverage shouldn't drop.
- Branch coverage > line coverage.
- Mutation testing > both for actual quality signal.
Coverage above 90% is rarely cost-effective except for libraries / safety-critical code.
11. Contract / consumer-driven testing
For microservices: each consumer publishes a contract (the responses it expects). Producer's CI runs against all consumer contracts. Catches breaking API changes before deploy.
Tools: Pact, Spring Cloud Contract.
Heavy investment; high payoff at 5+ services.
12. Fast / slow tiers
Split tests by speed:
- Fast — < 1 ms each; run on every save in IDE; < 5 s total.
- Slow — DB, network; run on push; minutes ok.
- Nightly — e2e, perf, mutation; hours ok.
Don't make engineers wait 10 min for a unit test rerun.
13. Reality check
A modern test setup checklist:
- pytest / jest / your stack's standard runner.
- 80%+ branch coverage (with caveats above).
- Hypothesis / fast-check for pure-function modules.
- Testcontainers for integration with real services.
- Mutation testing weekly.
- Coverage as a PR comment, not a hard gate.
- Flaky test budget — < 0.5% flake rate.
- Slow tests in a separate CI lane.
The teams that take testing seriously ship faster, not slower. The teams that "don't have time for tests" pay back the debt at 3 AM during outages.
Reading material
- Martin Fowler — Mocks Aren't Stubs
- Working Effectively with Legacy Code (Michael Feathers)
- xUnit Test Patterns (Gerard Meszaros)
- Hypothesis — Articles
In-depth research material
- Hypothesis Python — Docs
- Pact — Consumer-driven contract testing
- PIT — Mutation testing for Java
- Testcontainers docs
Video reference
▶︎ Property-Based Testing Explained (John Hughes)
Pick a quiet 30 minutes during this session to actually watch it. Don't multitask.
LeetCode — Validate Binary Search Tree
- Link: https://leetcode.com/problems/validate-binary-search-tree/
- Difficulty: Medium
- Why this problem: Verifying a structural invariant is exactly what tests do. Naive solutions miss edge cases; property-based thinking catches them.
- Time-box: 30 minutes. Look up the editorial only after.
Post-session checklist
By the end of this session you should be able to:
- Pick test-pyramid shape for a given codebase (microservices vs monolith).
- Choose between mocks, fakes, stubs, spies for a scenario.
- Write a property-based test for a pure function.
- Explain mutation kill-rate and why it beats line coverage.
- Triage and fix a flaky test class.
- Solve
validate-binary-search-tree— classic invariant check, both naive and property-aware.
Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.