Search Tech Journey

Find topics, journeys and posts

back to blog
engineeringintermediate 12m2026-06-09

Testing, Mocks, Property-Based Tests, Mutation Testing

Session 41 of the 48-session learning series.

Date: Sat, 2026-07-11 · Time: 14:30–16:30 IST · Track: 🧱 OOP & Languages (OOP) · Parent 28-day topic: Day 20 · Est. read: 2 h

Why this session matters

This is Session 41 of 48 in the OOP track. Code without tests is a draft. Most engineers stop at "I wrote some unit tests and they pass" — and miss property-based tests, mutation testing, and the discipline of fast/slow tiers. The compounding returns on investment in test infrastructure are massive over a multi-year codebase.

Agenda

  • The test pyramid — unit, integration, e2e; what each catches
  • Mocking — stubs, mocks, fakes, spies; when each is right
  • Property-based testing — Hypothesis, fast-check; the bug-finding superpower
  • Mutation testing — measuring the quality of the tests themselves
  • Test infrastructure — fixtures, parallelism, flakes, coverage thresholds

Pre-read (skim before the session)

Deep dive

1. The pyramid (and the diamond, and the trophy)

        ▲
        e2e          slow, brittle, gives confidence at top
       ─────
     integration     real DB / queue / network
    ────────────
   unit tests       fast, deterministic, isolate one thing

Classic pyramid: unit-heavy, fewer integration, few e2e.

Modern variants:

  • Trophy (Kent C. Dodds) — heavier on integration than pure pyramid.
  • Diamond — light on unit, heavy on integration.

Right shape depends on your tech. Microservices = lean toward integration; tight monolith = lean toward unit.

2. What "unit" actually means

Two schools:

  • Solitary unit — mock every collaborator. Test the function in isolation.
  • Sociable unit — use real collaborators when cheap (in-process, no I/O). Mock only external boundaries.

The sociable school catches integration bugs earlier; solitary is faster and easier to debug.

Pragmatic default: sociable units with mocked external boundaries (HTTP, DB if remote, time, randomness).

3. Stubs vs mocks vs fakes vs spies

TypeBehaviourVerifies?
StubReturns hardcoded valuesNo
MockExpects specific calls; fails if unmetYes
FakeWorking implementation, simplified (in-memory DB)No
SpyWraps real object; records callsYes

Rule: prefer fakes > stubs > mocks > spies. Mocks couple you to implementation; refactoring breaks tests for no reason.

4. Test doubles per language

  • Python: unittest.mock, pytest-mock. For HTTP: responses, httpx_mock, pytest-httpserver. For time: freezegun.
  • Node/TS: jest.mock(), sinon, nock for HTTP.
  • Java: Mockito, WireMock for HTTP.
  • Go: handcrafted interfaces + table-driven tests (idiomatic; few libs).
  • Rust: mockall, wiremock.

5. Property-based testing

Instead of assert add(2, 3) == 5, write for any a, b: assert add(a, b) == add(b, a). The framework generates 1000s of cases and shrinks failures to minimal examples.

from hypothesis import given, strategies as st

@given(st.lists(st.integers()))
def test_sort_idempotent(xs):
    assert sorted(sorted(xs)) == sorted(xs)

Where it wins:

  • Stateless pure functions (parsers, serialisers, hashers).
  • Round-trip properties (encode(decode(x)) == x).
  • Invariants of data structures (BST always sorted, heap property).
  • Concurrency models (stateful machine testing).

Hours of property-based testing find bugs that years of example-based won't.

6. Mutation testing

A mutation test framework changes your code slightly (+ to -, > to \<) and re-runs your tests. If tests still pass → your test suite missed the mutation → suite is weaker than coverage suggests.

Tools: PIT (Java), mutmut (Python), Stryker (JS/TS/.NET).

Original:  if x > 0:
Mutant:    if x >= 0:

Result: kill rate (% of mutants caught). 80%+ = good; under 60% = your tests check that code runs, not that it's correct.

Run weekly, not per-commit (it's slow).

7. Fixtures and setup

Reuse expensive setup; avoid global state:

  • pytest @fixture(scope='module') for one-per-file.
  • pytest @fixture(scope='session') for one-per-test-run.
  • Use Testcontainers for ephemeral DB / Kafka / Redis in CI.

Anti-pattern: shared mutable state across tests → flakes that only fail on rerun.

8. Parallelism and isolation

Run tests in parallel for speed:

  • pytest -n auto (with pytest-xdist).
  • jest --maxWorkers.

Each parallel worker needs isolation:

  • Random schema/DB per worker.
  • Random ports.
  • No shared temp files.
  • No global mocks.

The discipline costs upfront; pays back every CI run.

9. Flaky tests

A flaky test is worse than no test — it teaches the team to ignore failures.

Common causes:

  • Timing assumptions (sleep(1) instead of wait_until).
  • Ordering assumptions on un-ordered output.
  • Network calls leaking to real services.
  • Shared state leaking between tests.

Triage:

  • Mark flaky → quarantine (@pytest.mark.flaky with auto-rerun).
  • Fix within a sprint or delete the test.
  • Track flake rate per test in CI history.

10. Coverage

A measurement, not a goal. 80% line coverage of "if x is None" branches is meaningless.

Useful coverage practices:

  • Track per-PR delta — coverage shouldn't drop.
  • Branch coverage > line coverage.
  • Mutation testing > both for actual quality signal.

Coverage above 90% is rarely cost-effective except for libraries / safety-critical code.

11. Contract / consumer-driven testing

For microservices: each consumer publishes a contract (the responses it expects). Producer's CI runs against all consumer contracts. Catches breaking API changes before deploy.

Tools: Pact, Spring Cloud Contract.

Heavy investment; high payoff at 5+ services.

12. Fast / slow tiers

Split tests by speed:

  • Fast — < 1 ms each; run on every save in IDE; < 5 s total.
  • Slow — DB, network; run on push; minutes ok.
  • Nightly — e2e, perf, mutation; hours ok.

Don't make engineers wait 10 min for a unit test rerun.

13. Reality check

A modern test setup checklist:

  • pytest / jest / your stack's standard runner.
  • 80%+ branch coverage (with caveats above).
  • Hypothesis / fast-check for pure-function modules.
  • Testcontainers for integration with real services.
  • Mutation testing weekly.
  • Coverage as a PR comment, not a hard gate.
  • Flaky test budget — < 0.5% flake rate.
  • Slow tests in a separate CI lane.

The teams that take testing seriously ship faster, not slower. The teams that "don't have time for tests" pay back the debt at 3 AM during outages.

Reading material

In-depth research material

Video reference

▶︎ Property-Based Testing Explained (John Hughes)

Pick a quiet 30 minutes during this session to actually watch it. Don't multitask.

LeetCode — Validate Binary Search Tree

Post-session checklist

By the end of this session you should be able to:

  • Pick test-pyramid shape for a given codebase (microservices vs monolith).
  • Choose between mocks, fakes, stubs, spies for a scenario.
  • Write a property-based test for a pure function.
  • Explain mutation kill-rate and why it beats line coverage.
  • Triage and fix a flaky test class.
  • Solve validate-binary-search-tree — classic invariant check, both naive and property-aware.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.