Deploying Web Applications in Azure with Docker
What you'll leave with: a correct mental model of containers (not just the commands), a secure production-grade
Dockerfilepattern, a working local dev loop with Docker Compose, and two deployment paths on Azure — App Service (classic) and Container Apps (modern, scale-to-zero) — with the exact CLI commands, gotchas, and cost/architecture tradeoffs.
1. A correct mental model: containers are not VMs
A very common misunderstanding is that containers are "lightweight VMs". They are not. A VM virtualises hardware (its own kernel, bootloader, emulated devices). A container virtualises the OS view for a group of processes — they all share the host kernel.
Three kernel features do the work:
| Feature | What it gives you | Example |
|---|---|---|
| Namespaces | Isolated view of resources | pid (its own PID 1), net (its own NIC), mnt (its own filesystem root), uts (its own hostname), ipc, user |
| cgroups v2 | Resource limits & accounting | --memory=512m, --cpus=1.5 |
| Union FS (overlayfs) | Layered, copy-on-write images | Each RUN/COPY in a Dockerfile = one read-only layer; container gets a thin R/W layer on top |
Consequences that matter:
- Kernel features the host lacks, you can't get inside the container. (e.g. newer
io_uringfeatures, newer eBPF, modernseccompprofiles.) - You cannot run Linux containers on Windows/macOS natively — Docker Desktop runs a tiny Linux VM for you; that's why file-sync can be slow.
- Image size is dominated by layers, so ordering
COPYandRUNcorrectly is a real perf/cost lever.
The runtime stack on Linux: docker CLI → dockerd → containerd → runc. runc is the OCI reference that actually calls clone(), sets up the namespaces and cgroups, pivot_roots the filesystem, and execves your entrypoint. Knowing this lets you debug "why is my container dying with exit 137?" (answer: OOMKilled by the memory cgroup).
2. A production-grade Dockerfile pattern
The Dockerfile most tutorials show you (including the earlier version of this post) has at least four production problems:
- Runs as root.
- Uses
COPY . /appbeforepip install, so every code change busts the pip cache. - Uses a single stage — your build tools (gcc, dev headers, pip wheel cache) ship to production.
- No
HEALTHCHECK, noEXPOSEdiscipline, no explicitWORKDIR.
Here is a pattern that fixes all of this for a Python/Flask app (the same shape works for Node/Go/Java):
# syntax=docker/dockerfile:1.7
# ---------- Stage 1: builder ----------
FROM python:3.12-slim AS builder
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1
WORKDIR /build
# Install only build-time OS deps, then drop them
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential gcc \
&& rm -rf /var/lib/apt/lists/*
# Copy ONLY dependency manifests first so this layer is cached on code-only changes
COPY requirements.txt .
RUN pip install --prefix=/install -r requirements.txt
# ---------- Stage 2: runtime ----------
FROM python:3.12-slim AS runtime
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PORT=8000
# Non-root user
RUN useradd --create-home --shell /usr/sbin/nologin --uid 10001 app
WORKDIR /app
# Bring in only the built site-packages — no gcc, no caches
COPY --from=builder /install /usr/local
# App code last = best cache behaviour
COPY --chown=app:app . .
USER app
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8000/healthz', timeout=2).status==200 else 1)"
# Use gunicorn, not `python app.py`, in production
CMD ["gunicorn", "-b", "0.0.0.0:8000", "-w", "2", "--threads", "4", "app:app"]
Pair it with a strict .dockerignore:
.git
.venv
__pycache__
*.pyc
.env
.env.*
tests/
docs/
node_modules/
Why this matters (concrete numbers):
- Python base image: python:3.12 ≈ 1.0 GB, python:3.12-slim ≈ 130 MB, multi-stage + slim + --no-cache-dir typically lands a Flask app around 170–220 MB.
- Proper layer order: on a code-only edit, rebuild time drops from ~90s to ~5s because pip install is cached.
- Non-root user: blocks ~90% of common image-level CVE exploit paths.
3. Local dev loop with Docker Compose
Real apps have more than one process. A docker-compose.yml lets you bring up the whole system with one command and a consistent network.
services:
web:
build: .
ports: ["8000:8000"]
environment:
DATABASE_URL: postgresql://app:app@db:5432/app
REDIS_URL: redis://cache:6379/0
depends_on:
db: { condition: service_healthy }
cache: { condition: service_started }
develop:
watch:
- action: sync
path: ./src
target: /app
- action: rebuild
path: ./requirements.txt
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: app
POSTGRES_PASSWORD: app
POSTGRES_DB: app
volumes: ["pgdata:/var/lib/postgresql/data"]
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app"]
interval: 5s
retries: 5
cache:
image: redis:7-alpine
volumes:
pgdata:
Key points most tutorials get wrong:
- Service name = DNS name.
webreaches Postgres at hostdb, notlocalhost. Compose puts every service on a user-defined bridge network where Docker's embedded DNS resolves service names. depends_onalone does not wait for readiness. Usecondition: service_healthywith ahealthcheck.- Named volumes, not bind mounts, for databases. Bind mounts on macOS/Windows are slow and can corrupt Postgres WAL on some setups.
develop.watch(Compose v2.22+) replaces the olddocker-compose up --buildloop with fast file sync — much closer to a native dev experience.
Run: docker compose up --watch.
4. Security hardening checklist
Production containers need more than "it works":
- ☑ Non-root user (
USERdirective in Dockerfile;runAsNonRoot: truein Kubernetes). - ☑ Minimal base image —
-slim,distroless, orchainguardimages. Avoid-alpinefor Python (musl libc compatibility issues with many wheels). - ☑ Pin versions —
FROM python:3.12.5-slim@sha256:<digest>, notpython:latest. - ☑ No secrets in layers. Anything
COPYed is recoverable viadocker history. Inject secrets via env vars / Azure Key Vault at runtime. - ☑ Scan on every build —
docker scout cves <image>,trivy image <image>, orgrype <image>. - ☑ Read-only root filesystem at runtime —
--read-only+ a tmpfs mount for/tmpif needed. - ☑ Drop capabilities —
--cap-drop=ALL --cap-add=NET_BIND_SERVICE(if you need < 1024 ports). - ☑ Resource limits — always set
--memoryand--cpus(or their Compose/K8s equivalents). Without them, one runaway container can OOM the whole host.
A fast CI gate, in one line:
trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:${GIT_SHA}
5. Pushing the image: Azure Container Registry (ACR)
Prefer ACR over Docker Hub for Azure deploys — it's inside your VNET, supports managed-identity pulls, and avoids Docker Hub's public rate limits.
# One-time setup
az group create -n rg-blog -l eastus
az acr create -n myblogacr -g rg-blog --sku Basic
# Every build
az acr login -n myblogacr
docker build -t myblogacr.azurecr.io/blog:$(git rev-parse --short HEAD) .
docker push myblogacr.azurecr.io/blog:$(git rev-parse --short HEAD)
# Or build in the cloud (no local Docker needed)
az acr build -r myblogacr -t blog:$(git rev-parse --short HEAD) .
az acr build is a sleeper feature — it runs the build on ACR's build agents, which is often faster than your laptop and doesn't require Docker locally (great for CI runners).
6. Deployment path A — Azure App Service (classic PaaS)
Best for: a single long-running web app, simple scaling, 1+ instances always on, mature blue/green via deployment slots, custom domain + managed SSL in one click.
# Linux App Service plan (P1V3 recommended for prod; B1 for dev)
az appservice plan create -g rg-blog -n plan-blog --is-linux --sku B1
# Web app that pulls from ACR using a system-assigned managed identity
az webapp create \
-g rg-blog -p plan-blog -n myblog-app \
--deployment-container-image-name myblogacr.azurecr.io/blog:latest
# Give the webapp a managed identity and AcrPull on the registry
az webapp identity assign -g rg-blog -n myblog-app
APP_PRINCIPAL_ID=$(az webapp identity show -g rg-blog -n myblog-app --query principalId -o tsv)
ACR_ID=$(az acr show -n myblogacr --query id -o tsv)
az role assignment create --assignee <span class="katex-inline" data-katex="APP_PRINCIPAL_ID --role AcrPull --scope "></span>ACR_ID
# Tell App Service to use MI for pulls (no registry username/password stored)
az webapp config set -g rg-blog -n myblog-app --generic-configurations '{"acrUseManagedIdentityCreds": true}'
# App settings — surface as env vars inside the container
az webapp config appsettings set -g rg-blog -n myblog-app --settings \
WEBSITES_PORT=8000 \
WEBSITES_ENABLE_APP_SERVICE_STORAGE=false \
DATABASE_URL="@Microsoft.KeyVault(...)"
Gotchas that catch people:
WEBSITES_PORT— App Service will only route traffic to the port you declare here. If your container listens on 8080, setWEBSITES_PORT=8080.- Always-On — enable it on any non-Free tier, otherwise the first request after idle takes ~20s.
- Deployment slots give you true zero-downtime: deploy to
staging, test, thenaz webapp deployment slot swap— DNS doesn't change, and the slot warms up before swap. - Log streaming:
az webapp log tail -g rg-blog -n myblog-app.
7. Deployment path B — Azure Container Apps (modern, serverless)
Best for: microservices, event-driven workloads, anything that benefits from scale-to-zero, Dapr, or KEDA-based scaling (e.g., scale on queue depth).
az provider register -n Microsoft.App
az provider register -n Microsoft.OperationalInsights
az containerapp env create \
-g rg-blog -n cae-blog -l eastus
az containerapp create \
-g rg-blog -n ca-blog \
--environment cae-blog \
--image myblogacr.azurecr.io/blog:latest \
--registry-server myblogacr.azurecr.io \
--registry-identity system \
--target-port 8000 --ingress external \
--min-replicas 0 --max-replicas 10 \
--cpu 0.5 --memory 1Gi \
--secrets "db-url=keyvaultref:https://kv-blog.vault.azure.net/secrets/db-url,identityref:system" \
--env-vars "DATABASE_URL=secretref:db-url"
Why you'd pick this over App Service:
| Dimension | App Service | Container Apps |
|---|---|---|
| Scale to zero | No (B1+) | Yes |
| Per-second billing when idle | No | Yes |
| Scale on queue / custom metric | Limited | KEDA — yes |
| Multiple revisions / traffic-split | Slots (2) | Revisions (N), % traffic split |
| Dapr sidecars | No | Yes |
| VNET integration | Regional or private endpoint | Internal env or workload profiles |
| Min latency on cold start | n/a (always on) | ~2–5s cold start |
Rule of thumb: one user-facing monolith with steady traffic → App Service. Many small services or bursty/event-driven → Container Apps.
8. Observability you should turn on day one
- Log Analytics workspace attached to the App Service / ACA environment — stream
stdout/stderrto KQL. - Application Insights SDK inside the app for distributed tracing (
opentelemetry-instrumentation-flaskis three lines of code). - Container health probes — App Service uses
HEALTHCHECKfrom your Dockerfile; ACA has explicitliveness/readiness/startupprobes you should configure. - Alerts on: HTTP 5xx rate > 1%, p95 latency > 1s, memory > 80%, restart count > 0 in 10 min.
A minimal KQL query for "who's erroring right now":
AppServiceConsoleLogs
| where TimeGenerated > ago(15m)
| where ResultDescription has_any ("ERROR","Exception","Traceback")
| summarize count() by bin(TimeGenerated, 1m), _ResourceId
| render timechart
9. Common mistakes to avoid (corrections to the original post)
The original draft of this post was an AI-generated placeholder with several issues worth calling out:
- ❌ "Use
python:3.8-slim" — 3.8 reached end-of-life in Oct 2024. Use a supported Python (3.11/3.12 at the time of writing). - ❌
CMD ["python", "app.py"]in production — Flask's dev server is single-threaded and debug-mode by default. Use gunicorn (sync + threads, orgthread/uvicornworkers for async frameworks). - ❌ Pushing to Docker Hub for Azure deploys — works, but you pay rate-limit tax and lose managed-identity pulls. Use ACR.
- ❌ "Day 1 / Day 2" fake learning plan — replace with concrete milestones tied to the repo you actually ship.
- ❌ No
.dockerignore— without it,COPY . .leaks your.git,.env, and__pycache__into the image.
10. TODO — self-sufficient action list
After reading the above, you should be able to tick each of these off without any extra reading. If a step feels unclear, re-read the section linked in parentheses.
Foundations
- [ ] Install Docker Desktop (or Colima on macOS for a lighter alternative), run
docker run hello-world, then explain in your own words what namespaces/cgroups did (§1). - [ ] Run
docker inspect <container>and find the PID, MountID, and cgroup path. Open/proc/<pid>/ns/on the host and show the container's namespaces.
Build a real image
- [ ] Port the two-stage
Dockerfilein §2 to your actual app (Python or otherwise). Commit a.dockerignore. - [ ] Record image sizes before and after multi-stage. Target: ≤ 250 MB for a Python web app, ≤ 80 MB for a Go binary.
- [ ] Add a
HEALTHCHECKthat hits a real/healthzendpoint and returns JSON including git SHA + uptime.
Local dev loop
- [ ] Write a
docker-compose.ymlfor app + Postgres + Redis with health-gated startup (§3). Verifydocker compose up --watchhot-reloads on file change without full rebuild.
Security pass
- [ ] Run
trivy imageagainst your image; get HIGH/CRITICAL count to zero (update base image / pin versions as needed). - [ ] Confirm with
docker run --rm myapp whoamithat the process is NOT root. - [ ] Add
--read-only+tmpfs /tmpto yourdocker runcommand and verify the app still boots.
Ship to ACR
- [ ] Create an ACR, push a tagged image built remotely with
az acr build(no local Docker needed). - [ ] Tag with the git short SHA; never ship
:latestto production.
Deploy — pick ONE path to start, then do the other
- [ ] App Service path: provision Linux plan + webapp, wire managed-identity
AcrPull, setWEBSITES_PORT, deploy, hit the public URL. - [ ] Add a
stagingslot, deploy v2 to it, verify, then swap → confirm zero downtime. - [ ] Container Apps path: provision an environment, deploy with
--min-replicas 0, confirm scale-to-zero after idle, confirm cold-start behaviour. - [ ] Create a second revision, split traffic 50/50 between revisions, then promote to 100%.
Observability
- [ ] Enable App Insights (SDK in code + connection string in env).
- [ ] Write 3 KQL queries in Log Analytics: error rate by endpoint, p95 latency, slow DB calls.
- [ ] Wire one alert:
HTTP 5xx > 1% over 5 min→ email/Teams.
Stretch (only after the above)
- [ ] Migrate the deploy to Bicep or Terraform so the whole stack is
azd up-able. - [ ] Add GitHub Actions: build → scan → push to ACR →
az containerapp update --image …on merge tomain. - [ ] Compare cost for your actual traffic pattern: App Service B1 24×7 vs Container Apps scale-to-zero. Document the break-even point.
When every box above is checked, flip status: workinprogress → status: published in the frontmatter.