Search Tech Journey

Find topics, journeys and posts

back to blog
engineeringintermediate 12m2026-06-09

Idiomatic Python (and a Touch of C++) — Type Hints, Protocols, Dataclasses

Session 29 of the 48-session learning series.

Date: Thu, 2026-07-02 · Time: 18:00–20:00 IST · Track: 🧱 OOP & Languages (OOP) · Parent 28-day topic: Day 20 · Est. read: 2 h

Why this session matters

This is Session 29 of 48 in the OOP & Languages track. "Idiomatic" Python looks like Python; novice Python looks like Java written in Python's syntax. The gap shows up in code reviews, in interviews, and in maintainability over years. A bit of modern C++ contrast keeps you sharp on what "fast" and "low-level" actually mean.

Agenda

  • Type hints, generics, TypeVar, ParamSpec — modern Python typing
  • Protocols vs ABCs — structural vs nominal subtyping
  • Dataclasses, frozen, slots; when to use Pydantic instead
  • Pythonic idioms — comprehensions, context managers, generators, dunder methods
  • A short detour: equivalent C++ idioms (RAII, templates, concept)

Pre-read (skim before the session)

Deep dive

1. Why type hints

They don't enforce anything at runtime. So why bother?

  • mypy / pyright catch a real class of bugs (None passed to a str param) at PR time.
  • IDE autocomplete becomes useful — methods, fields, return values all surface.
  • Documentation that doesn't drift — the type is the spec.
  • Refactoring is safe — rename a field and the type checker finds every caller.

Cost: ~10% more keystrokes; pays back within a quarter on any non-trivial codebase.

2. Modern typing essentials

from typing import Optional, Sequence, Iterator, Callable
from collections.abc import Mapping

def top_k(items: Sequence[int], k: int = 10) -> list[int]:
    return sorted(items, reverse=True)[:k]

def parse(text: str) -> Optional[dict]:
    ...

def stream() -> Iterator[bytes]:
    ...

Handler = Callable[[str, int], bool]

Python 3.9+: list[int] instead of List[int]. 3.10+: int | None instead of Optional[int]. 3.12+: cleaner type syntax (PEP 695).

3. Generics and TypeVar

from typing import TypeVar

T = TypeVar("T")

def first[T](items: list[T]) -> T:    # 3.12+ syntax
    return items[0]

def first_legacy(items: list[T]) -> T:  # pre-3.12
    return items[0]

Use generics on containers, factories, and any function whose return type depends on input type.

4. Protocols (structural typing)

ABCs (abc.ABC) require explicit inheritance — nominal typing. Protocols check "does this object have these methods?" at type-check time — structural typing (Go interfaces, TS interfaces).

from typing import Protocol

class Readable(Protocol):
    def read(self, n: int = -1) -> bytes: ...

def consume(src: Readable) -> bytes:
    return src.read()

# Works with file, BytesIO, anything with a .read() method — no inheritance required.

Use Protocols when:

  • Defining duck-typed APIs.
  • Decoupling from a specific class hierarchy.
  • Mocking — your test double satisfies the Protocol; no inheritance ceremony.

5. Dataclasses

from dataclasses import dataclass, field

@dataclass(frozen=True, slots=True)
class Point:
    x: float
    y: float
    label: str = "anon"
    metadata: dict = field(default_factory=dict)

p = Point(1.0, 2.0)
  • frozen=True — immutable, hashable (good for cache keys).
  • slots=True — no __dict__; smaller memory; ~20% attribute access speedup.
  • field(default_factory=...) — for mutable defaults; never field=[].

Default to dataclass for plain data carriers. Reach for Pydantic when you need validation/parsing from JSON.

6. Pydantic v2

from pydantic import BaseModel, Field

class User(BaseModel):
    id: int
    email: str = Field(pattern=r"[^@]+@[^@]+\.[^@]+")
    age: int | None = None

u = User.model_validate_json('{"id": 1, "email": "a@b.com"}')

Use Pydantic at the edges of your system (HTTP request parsing, config loading). Use dataclasses internally. Mixing them inside business logic creates redundant validation.

7. Comprehensions, generators, the itertools toolbox

squares = [x*x for x in range(10)]
even_squares = [x*x for x in range(10) if x % 2 == 0]
lookup = {u.id: u for u in users}
unique_emails = {u.email for u in users}

# Generator (lazy, low memory)
def stream_squares(n):
    for x in range(n):
        yield x * x

from itertools import chain, groupby, accumulate, pairwise

Generators are the killer feature for ETL — process TB-sized streams without loading into RAM.

8. Context managers

from contextlib import contextmanager

@contextmanager
def timed(label: str):
    t = time.perf_counter()
    try:
        yield
    finally:
        print(f"{label}: {time.perf_counter() - t:.3f}s")

with timed("query"):
    rows = db.fetch(...)

Use them for: timing, transactions, locks, temp-file cleanup, mocking. Anything with "set up, do work, always tear down" shape.

9. Dunder methods

Implement the protocol the language expects:

WantDunder
len(x)__len__
for ... in x__iter__
x[i]__getitem__
x == y__eq__
hash(x)__hash__ (must match __eq__)
x + y__add__
print(x)__str__ (user); __repr__ (dev)
with x:__enter__ + __exit__
x()__call__

Always implement __repr__ on data classes — debugging without it is misery.

10. The performance escape hatches

Python is slow; sometimes you need fast. Order of attempt:

  1. numpy / pandas / polars — vectorise. 100x easy.
  2. numba @jit — JIT compile a hot loop. 10–100x for numeric code.
  3. cython — compile a module. Static types optional, escape GIL with nogil:.
  4. pybind11 / cffi — bind C/C++ for true native speed.
  5. Rewrite the hot path in Rust (pyo3). Modern teams' choice.

Profile before optimising. cProfile + snakeviz for CPU; tracemalloc for memory; py-spy for prod sampling.

11. A short C++ contrast

ConceptPythonModern C++
Resource cleanupwith / __exit__RAII (destructors)
Polymorphismduck typing / Protocolsvirtual functions / concept (C++20)
GenericsTypeVar / Protocolstemplates / concept
Immutabilityfrozen=True dataclassconst, constexpr
ThreadsGIL — use multiprocessing/asynciotrue parallel threads + std::atomic
MemoryGCmanual via unique_ptr / shared_ptr
Buildpip installcmake / vcpkg (sigh)

C++20 concept is essentially compile-time Protocols. The convergence is real.

12. Reality check

Idiomatic Python checklist for a new project:

  • Strict mode mypy --strict (or pyright strict).
  • Dataclasses or Pydantic — pick per layer, don't mix internally.
  • ruff for lint + format (replaces black/isort/flake8 in 1 tool).
  • pytest with pytest-cov, target 80% coverage on critical paths.
  • Pre-commit hook: ruff + mypy + pytest fast tier.

You won't regret typed Python at 6 months. Lots of teams regret not adopting it earlier.

Reading material

In-depth research material

Video reference

▶︎ Python Typing Deep Dive (mCoding)

Pick a quiet 30 minutes during this session to actually watch it. Don't multitask.

LeetCode — Design HashMap

  • Link: https://leetcode.com/problems/design-hashmap/
  • Difficulty: Easy
  • Why this problem: Implement __getitem__/__setitem__/__delitem__ from scratch — the dict-like Protocol in disguise.
  • Time-box: 20 minutes. Look up the editorial only after.

Post-session checklist

By the end of this session you should be able to:

  • Write a generic function with TypeVar (and 3.12+ syntax).
  • Pick between Protocol and ABC for a given API surface.
  • Use dataclasses with frozen=True, slots=True and field(default_factory=...).
  • Implement __iter__, __len__, __eq__, __hash__ correctly.
  • Pick from numpy → numba → cython → C++ binding as performance escape hatches.
  • Solve design-hashmap — basic open addressing or chaining; the data-model contract.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.