engineeringintermediate 12m2026-06-09

Concurrency Models — Threads, Asyncio, GIL, Actors

Session 16 of the 48-session learning series.

Date: Sun, 2026-06-21 · Time: 14:30–16:30 IST · Track: 🧱 OOP & Languages (OOP) · Parent 28-day topic: Day 10 · Est. read: 2 h

Why this session matters

This is Session 16 of 48 in the OOP & Languages track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.

Agenda

The GIL — what it actually locks, why it exists, when it bites
Threading — when it works in Python (I/O-bound), classic primitives
Multiprocessing — true parallelism, IPC cost, pickle pitfalls
Asyncio — single-threaded cooperative scheduling, event loop, await
Actors and CSP — Erlang/Elixir, Go channels, Akka

Pre-read (skim before the session)

Deep dive

1. What problem are we even solving?

Concurrency ≠ parallelism. Concurrency = structuring a program to handle multiple things; parallelism = actually doing multiple things at once on multiple cores.

A web server handling 10K connections needs concurrency — even on one core, it must juggle. A matrix multiply needs parallelism — divide rows across cores.

2. The GIL — what it locks

CPython's Global Interpreter Lock is a mutex that protects access to CPython internals (reference counts, the import system, etc.). Only one thread can execute Python bytecode at a time.

I/O calls release the GIL (read, recv, sleep, requests.get).
C extensions can release the GIL (NumPy, PyTorch's CPU ops, lxml).
Pure-Python CPU work cannot run on multiple cores concurrently.

So:

✅ Threads work great for I/O-bound work (HTTP, DB, file).
❌ Threads do not speed up CPU-bound Python loops.

Python 3.13 ships an experimental no-GIL build (PEP 703). Most extensions still need to be GIL-free aware; expect mainstream by 3.15–3.16.

3. Threading — the boring but useful kind

import threading, queue, requests

def fetch(url, out):
    out.put((url, requests.get(url).status_code))

q = queue.Queue()
threads = [threading.Thread(target=fetch, args=(u, q)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()

Common primitives:

Primitive	Use case
`Lock`	Mutual exclusion
`RLock`	Re-entrant lock (same thread can re-acquire)
`Semaphore(n)`	Limit n concurrent holders (connection pool)
`Event`	One-shot signal (e.g., "config loaded")
`Condition`	Wait until some predicate holds; pair with a Lock
`Queue`	Thread-safe FIFO; the right primitive for most producer/consumer code

Rule: prefer queue.Queue over manual locks. Most thread bugs are bad locking.

4. Multiprocessing — real parallelism on CPython

from multiprocessing import Pool

def cpu_work(x): return sum(i*i for i in range(x))

with Pool(8) as pool:
    results = pool.map(cpu_work, range(100))

Each process has its own interpreter and GIL → real parallelism.
IPC cost: arguments and return values are pickled. Big numpy arrays = pickle hell. Use shared memory (multiprocessing.shared_memory or numpy.memmap) for those.
Spawn vs fork: on Linux, fork is fast but copies file handles. On macOS/Windows it's spawn (slow, but clean). Always test the start method you'll run in prod.

5. Asyncio — single-threaded cooperative

The event loop runs one coroutine at a time. When a coroutine awaits a future, the loop runs other ready coroutines.

import asyncio, aiohttp

async def fetch(session, url):
    async with session.get(url) as resp:
        return await resp.text()

async def main():
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(*[fetch(session, u) for u in urls])

asyncio.run(main())

10K concurrent HTTP requests on one thread, ~200 MB RAM. Same thing with threads = OOM.

6. Async pitfalls

Don't call blocking code inside a coroutine — it stalls the loop. Use asyncio.to_thread() for the rare requests.get you can't replace.
CPU-bound work in asyncio still serialises. Hand it to loop.run_in_executor() with a ProcessPoolExecutor.
Cancellation — a Task can be cancelled; you must handle CancelledError and clean up.
Async libs everywhere — mixing sync and async libs hurts. Pick a stack (FastAPI + httpx + asyncpg, or Sanic + aiohttp + aiopg) and stick with it.

7. Picking a model

Workload	Pick
Many concurrent I/O (10K+ requests)	asyncio
Moderate I/O (100s of requests), legacy sync libs	threading + ThreadPoolExecutor
CPU-bound, single machine	multiprocessing (or Cython/Numba/C extension)
CPU-bound, multi-machine	Spark, Dask, Ray
GUI / event-driven UI	event loop (Tkinter, Qt)
Real-time, low-latency mixed work	Go, Rust, Erlang/Elixir

8. Actors and CSP — alternative models

Actors (Erlang, Akka): isolated processes, mailbox, message passing. No shared state → no data races. Crashes are local; supervision trees restart actors.

CSP (Go channels): independent goroutines communicate by passing values on channels. "Don't communicate by sharing memory; share memory by communicating."

ch := make(chan int)
go func() { ch <- compute() }()
result := <-ch

Both models eliminate most race conditions by construction. Python asyncio.Queue between tasks is a CSP-flavoured pattern; it scales further than locks.

9. Common bug list (you will hit these)

Double-checked locking without volatile — classic singleton bug; in Python you can use a module-level singleton instead.
Forgetting to join — daemon thread keeps running, holds resources.
Deadlock from inconsistent lock order — always acquire locks in the same global order.
Pickling unpicklable objects in multiprocessing — DB connections, file handles. Pass paths/config, open in child.
await inside a sync function — silently returns a coroutine object. Linter (ruff RUF006, mypy) catches this.
Sharing a requests.Session across threads — it's thread-safe-ish but not safe; use one per thread or use httpx.

10. Production heuristics

Connection pool sized to worker count — DB pool of 10 + 50 workers = bottleneck. Pool size ≥ workers, or use async DB driver.
Backpressure — bounded queues, semaphores, drop or shed when full. Unbounded queues turn slowdowns into OOM.
Timeouts on every external call. No exceptions.
Profile before optimising — py-spy, cProfile for sync; aiomonitor for async.

Link: https://leetcode.com/problems/design-bounded-blocking-queue/
Difficulty: Medium
Why this problem: Two semaphores (empty, full) + a lock. Maps directly to a thread-safe channel.
Time-box: 30 minutes. Look up the editorial only after.

Post-session checklist

By the end of this session you should be able to:

Explain exactly what the GIL locks and when it releases.
Choose between asyncio, threading, multiprocessing for a given workload.
Implement a bounded producer/consumer with a queue.
List 3 common deadlock causes and how to avoid each.
Describe the actor model in 3 sentences.
Solve design-bounded-blocking-queue — two semaphores + a lock; the classic channel pattern.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.

← previous

Kafka Part 2 — Replication, ISR, Consumer Groups, Exactly-Once

Embeddings, Vector Spaces, Contrastive Learning