Memory Model, GC, Heap, GC Leaks, Profiling
Session 20 of the 48-session learning series.
Date: Thu, 2026-06-25 · Time: 18:00–20:00 IST · Track: 🧱 OOP & Languages (OOP) · Parent 28-day topic: Day 15 · Est. read: 2 h
Why this session matters
This is Session 20 of 48 in the OOP & Languages track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.
Agenda
- Memory layout — stack, heap, code, data segments
- Reference counting vs tracing GC; generational and compacting GCs
- Python's memory model — refcount + cycle collector; Java/Go GC
- Memory leaks in managed languages — cycles, caches, listeners
- Profiling and finding leaks — heap snapshots, allocators, py-spy
Pre-read (skim before the session)
- Pythonspeed — Python's memory model
- Java GC Basics (Oracle)
- Go GC Guide
- V8 Garbage Collection (Orinoco)
Deep dive
1. Process memory layout
high addr ┌──────────────────────┐
│ kernel space │
├──────────────────────┤
│ stack (grows ↓) │
│ │ │
│ ▼ │
│ │
│ ▲ │
│ │ │
│ heap (grows ↑) │
├──────────────────────┤
│ BSS (zero-init) │
│ data (init) │
│ text (code) │
low addr └──────────────────────┘
- Stack — fast, per-thread, LIFO. Local vars, return addresses. Small (default ~1–8 MB).
- Heap — slow(er), shared, free-form. Malloc / new / object allocations.
- Text — compiled code. Read-only, shared between processes via mmap.
- Data / BSS — globals.
2. Stack vs heap allocation cost
Stack push = sub rsp, N. 1 ns.
Heap malloc = allocator metadata lookup, bin search, ~50–200 ns.
Object pooling, arenas, and slab allocators exist to dodge this cost on hot paths.
3. Manual vs automatic memory management
Manual (C, C++) — you malloc / free. Mistakes = use-after-free, double-free, leaks.
RAII (C++, Rust) — destructors run on scope exit. Eliminates most manual errors. Rust's ownership system is RAII with compile-time enforcement.
Garbage collected (Java, Python, Go, JS, C#) — runtime tracks reachability, frees the unreachable.
4. Tracing GC — the family tree
Mark-sweep — walk all roots → mark reachable; sweep frees the rest. Simple, stops the world, fragments memory.
Mark-compact — like mark-sweep but compacts survivors to one end. Eliminates fragmentation; more expensive copy.
Copying (semi-space) — divide heap into two halves; allocate in one; on GC copy live to the other and flip. Fast for young objects, halves usable heap.
Generational — most objects die young (the generational hypothesis). Two/three generations (young, old, sometimes perm). Young collected often + cheaply; old rarely + expensively. The default for Java, .NET, JS.
Concurrent / incremental — do GC work concurrently with the mutator. Trades throughput for lower pause time. Java's G1, ZGC, Shenandoah; Go's concurrent collector.
5. Python's memory model
CPython uses reference counting as primary mechanism:
PyObject *obj = ...;
Py_INCREF(obj); // refcount++
Py_DECREF(obj); // refcount--; if (refcount == 0) free
Deterministic — object freed the moment last ref drops. Great for files, sockets (no defer close). Cost: every assignment touches a counter (lots of cache traffic).
Refcount can't free cycles (A → B → A). Python's cycle collector runs periodically (gc module) — generational mark-sweep over container objects only.
import gc
gc.collect() # force a cycle collection
gc.set_threshold(700, 10, 10)
gc.disable() # if you really know what you're doing
6. Java GC — the modern menu
- Serial GC — single thread, stop-the-world. Tiny heaps only.
- Parallel GC — multi-thread STW. Highest throughput; long pauses.
- G1 GC (default since Java 9) — concurrent + STW; region-based; tunable pause target.
- ZGC — sub-ms pauses on TB heaps. Higher CPU overhead.
- Shenandoah (Red Hat) — like ZGC; concurrent compaction.
Tuning rule of thumb:
- < 4 GB heap + throughput priority → Parallel GC.
- 4–32 GB heap + balanced → G1.
-
32 GB heap or low-latency requirement → ZGC.
7. Go GC
Concurrent mark-sweep, non-generational, write-barrier-based. Pause targets <1 ms by default. Tunable via GOGC env var (% of live heap to grow before next GC; default 100% means GC triggers when heap = 2× live).
GOGC=off disables GC (debugging only). GOMEMLIMIT (1.19+) sets a soft heap ceiling — pairs well with container memory limits.
8. Memory leaks in managed languages
GC frees the unreachable; leaks happen when objects stay reachable unintentionally:
- Caches without eviction.
dictgrowing forever. Usefunctools.lru_cache(maxsize=...),weakref.WeakValueDictionary. - Event listeners not unregistered. Especially in GUIs, web frontends, observers.
- Closures capturing big state.
def outer(huge): return lambda: huge[0]—hugelives as long as the closure. - Thread-locals. Long-lived threads with growing thread-local state.
- Global registries. Singletons accumulating refs.
- Module-level mutables.
LOGS = []that grows forever.
9. Finding leaks
Python:
import tracemalloc
tracemalloc.start()
# ... run workload ...
snap = tracemalloc.take_snapshot()
top = snap.statistics("lineno")[:10]
for s in top: print(s)
Or objgraph to visualise reference chains; memray (Bloomberg) for sample-based heap profiling — production-safe.
Java: jmap -dump:format=b,file=heap.bin \<pid>, open in MAT (Eclipse Memory Analyzer). Look at "Leak Suspects" report.
Node.js: --inspect, take heap snapshot in Chrome DevTools. Compare snapshots over time — growing classes = leak suspects.
Go: pprof heap profile. go tool pprof -alloc_space heap.pprof.
10. Cache hierarchy — the other memory cost
register 1 cycle (~0.3 ns)
L1 cache 4 cycles (~1 ns)
L2 cache 12 cycles (~3 ns)
L3 cache 40 cycles (~10 ns)
DRAM 200 cycles (~50–100 ns)
NVMe SSD ~50 μs
network ~ms
Cache-friendly code:
- Sequential access > random (prefetcher loves it).
- Arrays of structs (SoA) vs structs of arrays — depends on access pattern.
- Avoid pointer chasing (linked lists vs arrays).
This is why NumPy, polars, and Arrow exist — columnar layout + vectorised ops keep the CPU pipeline fed.
11. Allocators
Inside malloc:
- glibc ptmalloc / jemalloc / mimalloc — general-purpose; mimalloc is the fastest in benchmarks.
- Arena / bump allocator — allocate from a pre-sized chunk by bumping a pointer. Free everything at once (e.g., per-request arena in a web server).
- Slab allocator — pre-allocated free lists for fixed object sizes (Linux kernel).
- Pool / freelist — return-to-pool instead of free. Connection pools, buffer pools.
For high-allocation Python services, switching to jemalloc (LD_PRELOAD=/path/to/libjemalloc.so) often cuts RSS by 20–40% by reducing fragmentation.
12. RSS vs heap vs working set
top shows RSS (Resident Set Size) — pages currently in RAM. Includes:
- Heap that's actually touched.
- Shared libraries (overcounted).
- Page cache (kernel may evict to make room).
heap (in your GC stats) is just what the runtime tracks. RSS is usually higher (allocator overhead, fragmentation, native libs).
working set (cloud term) = pages actually used recently. Pricing on Azure / GCP often uses this.
Don't be alarmed by RSS > heap by 30–50%; that's normal.
Reading material
- Hennessy & Patterson — Computer Architecture
- CPython source — Objects/obmalloc.c
- The Garbage Collection Handbook (Jones et al.)
- Go GC Guide
In-depth research material
- Pythonspeed — Memory profiling toolset
- Memray (Bloomberg)
- ZGC paper
- Mark Roberts — Crafting Interpreters (GC chapter)
Video reference
▶︎ JetBrains — Garbage Collection in Python
Pick a quiet 30 minutes during this session to actually watch it. Don't multitask.
LeetCode — Lfu Cache
- Link: https://leetcode.com/problems/lfu-cache/
- Difficulty: Hard
- Why this problem: Two hash-maps + per-frequency LL; bookkeeping for min-freq. Real eviction policy.
- Time-box: 30 minutes. Look up the editorial only after.
Post-session checklist
By the end of this session you should be able to:
- Diagram stack/heap/data/text layout of a process.
- Explain why Python needs both refcount AND a cycle collector.
- Pick a Java GC for 8 GB throughput-bound vs 64 GB latency-bound.
- Identify 3 common leak patterns in your favourite language.
- Use tracemalloc or memray to find a leak in a small Python script.
- Solve
lfu-cache— two hash-maps + per-freq doubly-linked list; a real eviction policy.
Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.