engineeringintermediate 12m2026-06-09

Testing, Mocks, Property-Based Tests, Mutation Testing

Session 41 of the 48-session learning series.

Date: Sat, 2026-07-11 · Time: 14:30–16:30 IST · Track: 🧱 OOP & Languages (OOP) · Parent 28-day topic: Day 20 · Est. read: 2 h

Why this session matters

This is Session 41 of 48 in the OOP track. Code without tests is a draft. Most engineers stop at "I wrote some unit tests and they pass" — and miss property-based tests, mutation testing, and the discipline of fast/slow tiers. The compounding returns on investment in test infrastructure are massive over a multi-year codebase.

Agenda

The test pyramid — unit, integration, e2e; what each catches
Mocking — stubs, mocks, fakes, spies; when each is right
Property-based testing — Hypothesis, fast-check; the bug-finding superpower
Mutation testing — measuring the quality of the tests themselves
Test infrastructure — fixtures, parallelism, flakes, coverage thresholds

Pre-read (skim before the session)

Deep dive

1. The pyramid (and the diamond, and the trophy)

        ▲
        e2e          slow, brittle, gives confidence at top
       ─────
     integration     real DB / queue / network
    ────────────
   unit tests       fast, deterministic, isolate one thing

Classic pyramid: unit-heavy, fewer integration, few e2e.

Modern variants:

Trophy (Kent C. Dodds) — heavier on integration than pure pyramid.
Diamond — light on unit, heavy on integration.

Right shape depends on your tech. Microservices = lean toward integration; tight monolith = lean toward unit.

2. What "unit" actually means

Two schools:

Solitary unit — mock every collaborator. Test the function in isolation.
Sociable unit — use real collaborators when cheap (in-process, no I/O). Mock only external boundaries.

The sociable school catches integration bugs earlier; solitary is faster and easier to debug.

Pragmatic default: sociable units with mocked external boundaries (HTTP, DB if remote, time, randomness).

3. Stubs vs mocks vs fakes vs spies

Type	Behaviour	Verifies?
Stub	Returns hardcoded values	No
Mock	Expects specific calls; fails if unmet	Yes
Fake	Working implementation, simplified (in-memory DB)	No
Spy	Wraps real object; records calls	Yes

Rule: prefer fakes > stubs > mocks > spies. Mocks couple you to implementation; refactoring breaks tests for no reason.

4. Test doubles per language

Python: unittest.mock, pytest-mock. For HTTP: responses, httpx_mock, pytest-httpserver. For time: freezegun.
Node/TS: jest.mock(), sinon, nock for HTTP.
Java: Mockito, WireMock for HTTP.
Go: handcrafted interfaces + table-driven tests (idiomatic; few libs).
Rust: mockall, wiremock.

5. Property-based testing

Instead of assert add(2, 3) == 5, write for any a, b: assert add(a, b) == add(b, a). The framework generates 1000s of cases and shrinks failures to minimal examples.

from hypothesis import given, strategies as st

@given(st.lists(st.integers()))
def test_sort_idempotent(xs):
    assert sorted(sorted(xs)) == sorted(xs)

Where it wins:

Stateless pure functions (parsers, serialisers, hashers).
Round-trip properties (encode(decode(x)) == x).
Invariants of data structures (BST always sorted, heap property).
Concurrency models (stateful machine testing).

Hours of property-based testing find bugs that years of example-based won't.

6. Mutation testing

A mutation test framework changes your code slightly (+ to -, > to \<) and re-runs your tests. If tests still pass → your test suite missed the mutation → suite is weaker than coverage suggests.

Tools: PIT (Java), mutmut (Python), Stryker (JS/TS/.NET).

Original:  if x > 0:
Mutant:    if x >= 0:

Result: kill rate (% of mutants caught). 80%+ = good; under 60% = your tests check that code runs, not that it's correct.

Run weekly, not per-commit (it's slow).

7. Fixtures and setup

Reuse expensive setup; avoid global state:

pytest @fixture(scope='module') for one-per-file.
pytest @fixture(scope='session') for one-per-test-run.
Use Testcontainers for ephemeral DB / Kafka / Redis in CI.

Anti-pattern: shared mutable state across tests → flakes that only fail on rerun.

8. Parallelism and isolation

Run tests in parallel for speed:

pytest -n auto (with pytest-xdist).
jest --maxWorkers.

Each parallel worker needs isolation:

Random schema/DB per worker.
Random ports.
No shared temp files.
No global mocks.

The discipline costs upfront; pays back every CI run.

9. Flaky tests

A flaky test is worse than no test — it teaches the team to ignore failures.

Common causes:

Timing assumptions (sleep(1) instead of wait_until).
Ordering assumptions on un-ordered output.
Network calls leaking to real services.
Shared state leaking between tests.

Triage:

Mark flaky → quarantine (@pytest.mark.flaky with auto-rerun).
Fix within a sprint or delete the test.
Track flake rate per test in CI history.

10. Coverage

A measurement, not a goal. 80% line coverage of "if x is None" branches is meaningless.

Useful coverage practices:

Track per-PR delta — coverage shouldn't drop.
Branch coverage > line coverage.
Mutation testing > both for actual quality signal.

Coverage above 90% is rarely cost-effective except for libraries / safety-critical code.

11. Contract / consumer-driven testing

For microservices: each consumer publishes a contract (the responses it expects). Producer's CI runs against all consumer contracts. Catches breaking API changes before deploy.

Tools: Pact, Spring Cloud Contract.

Heavy investment; high payoff at 5+ services.

12. Fast / slow tiers

Split tests by speed:

Fast — < 1 ms each; run on every save in IDE; < 5 s total.
Slow — DB, network; run on push; minutes ok.
Nightly — e2e, perf, mutation; hours ok.

Don't make engineers wait 10 min for a unit test rerun.

13. Reality check

A modern test setup checklist:

pytest / jest / your stack's standard runner.
80%+ branch coverage (with caveats above).
Hypothesis / fast-check for pure-function modules.
Testcontainers for integration with real services.
Mutation testing weekly.
Coverage as a PR comment, not a hard gate.
Flaky test budget — < 0.5% flake rate.
Slow tests in a separate CI lane.

The teams that take testing seriously ship faster, not slower. The teams that "don't have time for tests" pay back the debt at 3 AM during outages.

Link: https://leetcode.com/problems/validate-binary-search-tree/
Difficulty: Medium
Why this problem: Verifying a structural invariant is exactly what tests do. Naive solutions miss edge cases; property-based thinking catches them.
Time-box: 30 minutes. Look up the editorial only after.

Post-session checklist

By the end of this session you should be able to:

Pick test-pyramid shape for a given codebase (microservices vs monolith).
Choose between mocks, fakes, stubs, spies for a scenario.
Write a property-based test for a pure function.
Explain mutation kill-rate and why it beats line coverage.
Triage and fix a flaky test class.
Solve validate-binary-search-tree — classic invariant check, both naive and property-aware.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.

← previous

Change Data Capture — Debezium, Outbox Pattern, Snapshot+Stream

Prompt Engineering at Production Scale — Templates, Caching, Drift