cloudintermediate 11m2024-08-28

Deploying Web Applications in Azure with Docker

A self-sufficient, production-minded walkthrough — from Docker internals to a hardened deploy on Azure App Service / Container Apps.

Deploying Web Applications in Azure with Docker

What you'll leave with: a correct mental model of containers (not just the commands), a secure production-grade Dockerfile pattern, a working local dev loop with Docker Compose, and two deployment paths on Azure — App Service (classic) and Container Apps (modern, scale-to-zero) — with the exact CLI commands, gotchas, and cost/architecture tradeoffs.

1. A correct mental model: containers are not VMs

A very common misunderstanding is that containers are "lightweight VMs". They are not. A VM virtualises hardware (its own kernel, bootloader, emulated devices). A container virtualises the OS view for a group of processes — they all share the host kernel.

Three kernel features do the work:

Feature	What it gives you	Example
Namespaces	Isolated view of resources	`pid` (its own PID 1), `net` (its own NIC), `mnt` (its own filesystem root), `uts` (its own hostname), `ipc`, `user`
cgroups v2	Resource limits & accounting	`--memory=512m`, `--cpus=1.5`
Union FS (overlayfs)	Layered, copy-on-write images	Each `RUN`/`COPY` in a Dockerfile = one read-only layer; container gets a thin R/W layer on top

Consequences that matter:

Kernel features the host lacks, you can't get inside the container. (e.g. newer io_uring features, newer eBPF, modern seccomp profiles.)
You cannot run Linux containers on Windows/macOS natively — Docker Desktop runs a tiny Linux VM for you; that's why file-sync can be slow.
Image size is dominated by layers, so ordering COPY and RUN correctly is a real perf/cost lever.

The runtime stack on Linux: docker CLI → dockerd → containerd → runc. runc is the OCI reference that actually calls clone(), sets up the namespaces and cgroups, pivot_roots the filesystem, and execves your entrypoint. Knowing this lets you debug "why is my container dying with exit 137?" (answer: OOMKilled by the memory cgroup).

2. A production-grade Dockerfile pattern

The Dockerfile most tutorials show you (including the earlier version of this post) has at least four production problems:

Runs as root.
Uses COPY . /app before pip install, so every code change busts the pip cache.
Uses a single stage — your build tools (gcc, dev headers, pip wheel cache) ship to production.
No HEALTHCHECK, no EXPOSE discipline, no explicit WORKDIR.

Here is a pattern that fixes all of this for a Python/Flask app (the same shape works for Node/Go/Java):

# syntax=docker/dockerfile:1.7
# ---------- Stage 1: builder ----------
FROM python:3.12-slim AS builder

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1 \
    PIP_NO_CACHE_DIR=1

WORKDIR /build

# Install only build-time OS deps, then drop them
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential gcc \
    && rm -rf /var/lib/apt/lists/*

# Copy ONLY dependency manifests first so this layer is cached on code-only changes
COPY requirements.txt .
RUN pip install --prefix=/install -r requirements.txt

# ---------- Stage 2: runtime ----------
FROM python:3.12-slim AS runtime

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PORT=8000

# Non-root user
RUN useradd --create-home --shell /usr/sbin/nologin --uid 10001 app
WORKDIR /app

# Bring in only the built site-packages — no gcc, no caches
COPY --from=builder /install /usr/local

# App code last = best cache behaviour
COPY --chown=app:app . .

USER app
EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8000/healthz', timeout=2).status==200 else 1)"

# Use gunicorn, not `python app.py`, in production
CMD ["gunicorn", "-b", "0.0.0.0:8000", "-w", "2", "--threads", "4", "app:app"]

Pair it with a strict .dockerignore:

.git
.venv
__pycache__
*.pyc
.env
.env.*
tests/
docs/
node_modules/

Why this matters (concrete numbers):

Python base image: python:3.12 ≈ 1.0 GB, python:3.12-slim ≈ 130 MB, multi-stage + slim + --no-cache-dir typically lands a Flask app around 170–220 MB.
Proper layer order: on a code-only edit, rebuild time drops from ~90s to ~5s because pip install is cached.
Non-root user: blocks ~90% of common image-level CVE exploit paths.

3. Local dev loop with Docker Compose

Real apps have more than one process. A docker-compose.yml lets you bring up the whole system with one command and a consistent network.

services:
  web:
    build: .
    ports: ["8000:8000"]
    environment:
      DATABASE_URL: postgresql://app:app@db:5432/app
      REDIS_URL: redis://cache:6379/0
    depends_on:
      db: { condition: service_healthy }
      cache: { condition: service_started }
    develop:
      watch:
        - action: sync
          path: ./src
          target: /app
        - action: rebuild
          path: ./requirements.txt

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: app
      POSTGRES_DB: app
    volumes: ["pgdata:/var/lib/postgresql/data"]
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app"]
      interval: 5s
      retries: 5

  cache:
    image: redis:7-alpine

volumes:
  pgdata:

Key points most tutorials get wrong:

Service name = DNS name. web reaches Postgres at host db, not localhost. Compose puts every service on a user-defined bridge network where Docker's embedded DNS resolves service names.
depends_on alone does not wait for readiness. Use condition: service_healthy with a healthcheck.
Named volumes, not bind mounts, for databases. Bind mounts on macOS/Windows are slow and can corrupt Postgres WAL on some setups.
develop.watch (Compose v2.22+) replaces the old docker-compose up --build loop with fast file sync — much closer to a native dev experience.

Run: docker compose up --watch.

4. Security hardening checklist

Production containers need more than "it works":

☑ Non-root user (USER directive in Dockerfile; runAsNonRoot: true in Kubernetes).
☑ Minimal base image — -slim, distroless, or chainguard images. Avoid -alpine for Python (musl libc compatibility issues with many wheels).
☑ Pin versions — FROM python:3.12.5-slim@sha256:<digest>, not python:latest.
☑ No secrets in layers. Anything COPYed is recoverable via docker history. Inject secrets via env vars / Azure Key Vault at runtime.
☑ Scan on every build — docker scout cves <image>, trivy image <image>, or grype <image>.
☑ Read-only root filesystem at runtime — --read-only + a tmpfs mount for /tmp if needed.
☑ Drop capabilities — --cap-drop=ALL --cap-add=NET_BIND_SERVICE (if you need < 1024 ports).
☑ Resource limits — always set --memory and --cpus (or their Compose/K8s equivalents). Without them, one runaway container can OOM the whole host.

A fast CI gate, in one line:

trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:${GIT_SHA}

5. Pushing the image: Azure Container Registry (ACR)

Prefer ACR over Docker Hub for Azure deploys — it's inside your VNET, supports managed-identity pulls, and avoids Docker Hub's public rate limits.

# One-time setup
az group create -n rg-blog -l eastus
az acr create  -n myblogacr -g rg-blog --sku Basic

# Every build
az acr login -n myblogacr
docker build -t myblogacr.azurecr.io/blog:$(git rev-parse --short HEAD) .
docker push     myblogacr.azurecr.io/blog:$(git rev-parse --short HEAD)

# Or build in the cloud (no local Docker needed)
az acr build -r myblogacr -t blog:$(git rev-parse --short HEAD) .

az acr build is a sleeper feature — it runs the build on ACR's build agents, which is often faster than your laptop and doesn't require Docker locally (great for CI runners).

6. Deployment path A — Azure App Service (classic PaaS)

Best for: a single long-running web app, simple scaling, 1+ instances always on, mature blue/green via deployment slots, custom domain + managed SSL in one click.

# Linux App Service plan (P1V3 recommended for prod; B1 for dev)
az appservice plan create -g rg-blog -n plan-blog --is-linux --sku B1

# Web app that pulls from ACR using a system-assigned managed identity
az webapp create \
  -g rg-blog -p plan-blog -n myblog-app \
  --deployment-container-image-name myblogacr.azurecr.io/blog:latest

# Give the webapp a managed identity and AcrPull on the registry
az webapp identity assign -g rg-blog -n myblog-app
APP_PRINCIPAL_ID=$(az webapp identity show -g rg-blog -n myblog-app --query principalId -o tsv)
ACR_ID=$(az acr show -n myblogacr --query id -o tsv)
az role assignment create --assignee $APP_PRINCIPAL_ID --role AcrPull --scope $ACR_ID

# Tell App Service to use MI for pulls (no registry username/password stored)
az webapp config set -g rg-blog -n myblog-app --generic-configurations '{"acrUseManagedIdentityCreds": true}'

# App settings — surface as env vars inside the container
az webapp config appsettings set -g rg-blog -n myblog-app --settings \
  WEBSITES_PORT=8000 \
  WEBSITES_ENABLE_APP_SERVICE_STORAGE=false \
  DATABASE_URL="@Microsoft.KeyVault(...)"

Gotchas that catch people:

WEBSITES_PORT — App Service will only route traffic to the port you declare here. If your container listens on 8080, set WEBSITES_PORT=8080.
Always-On — enable it on any non-Free tier, otherwise the first request after idle takes ~20s.
Deployment slots give you true zero-downtime: deploy to staging, test, then az webapp deployment slot swap — DNS doesn't change, and the slot warms up before swap.
Log streaming: az webapp log tail -g rg-blog -n myblog-app.

7. Deployment path B — Azure Container Apps (modern, serverless)

Best for: microservices, event-driven workloads, anything that benefits from scale-to-zero, Dapr, or KEDA-based scaling (e.g., scale on queue depth).

az provider register -n Microsoft.App
az provider register -n Microsoft.OperationalInsights

az containerapp env create \
  -g rg-blog -n cae-blog -l eastus

az containerapp create \
  -g rg-blog -n ca-blog \
  --environment cae-blog \
  --image myblogacr.azurecr.io/blog:latest \
  --registry-server myblogacr.azurecr.io \
  --registry-identity system \
  --target-port 8000 --ingress external \
  --min-replicas 0 --max-replicas 10 \
  --cpu 0.5 --memory 1Gi \
  --secrets "db-url=keyvaultref:https://kv-blog.vault.azure.net/secrets/db-url,identityref:system" \
  --env-vars "DATABASE_URL=secretref:db-url"

Why you'd pick this over App Service:

Dimension	App Service	Container Apps
Scale to zero	No (B1+)	Yes
Per-second billing when idle	No	Yes
Scale on queue / custom metric	Limited	KEDA — yes
Multiple revisions / traffic-split	Slots (2)	Revisions (N), % traffic split
Dapr sidecars	No	Yes
VNET integration	Regional or private endpoint	Internal env or workload profiles
Min latency on cold start	n/a (always on)	~2–5s cold start

Rule of thumb: one user-facing monolith with steady traffic → App Service. Many small services or bursty/event-driven → Container Apps.

8. Observability you should turn on day one

Log Analytics workspace attached to the App Service / ACA environment — stream stdout/stderr to KQL.
Application Insights SDK inside the app for distributed tracing (opentelemetry-instrumentation-flask is three lines of code).
Container health probes — App Service uses HEALTHCHECK from your Dockerfile; ACA has explicit liveness / readiness / startup probes you should configure.
Alerts on: HTTP 5xx rate > 1%, p95 latency > 1s, memory > 80%, restart count > 0 in 10 min.

A minimal KQL query for "who's erroring right now":

AppServiceConsoleLogs
| where TimeGenerated > ago(15m)
| where ResultDescription has_any ("ERROR","Exception","Traceback")
| summarize count() by bin(TimeGenerated, 1m), _ResourceId
| render timechart

9. Common mistakes to avoid (corrections to the original post)

The original draft of this post was an AI-generated placeholder with several issues worth calling out:

❌ "Use python:3.8-slim" — 3.8 reached end-of-life in Oct 2024. Use a supported Python (3.11/3.12 at the time of writing).
❌ CMD ["python", "app.py"] in production — Flask's dev server is single-threaded and debug-mode by default. Use gunicorn (sync + threads, or gthread/uvicorn workers for async frameworks).
❌ Pushing to Docker Hub for Azure deploys — works, but you pay rate-limit tax and lose managed-identity pulls. Use ACR.
❌ "Day 1 / Day 2" fake learning plan — replace with concrete milestones tied to the repo you actually ship.
❌ No .dockerignore — without it, COPY . . leaks your .git, .env, and __pycache__ into the image.

10. TODO — self-sufficient action list

After reading the above, you should be able to tick each of these off without any extra reading. If a step feels unclear, re-read the section linked in parentheses.

Foundations

Install Docker Desktop (or Colima on macOS for a lighter alternative), run docker run hello-world, then explain in your own words what namespaces/cgroups did (§1).
Run docker inspect <container> and find the PID, MountID, and cgroup path. Open /proc/<pid>/ns/ on the host and show the container's namespaces.

Build a real image

Port the two-stage Dockerfile in §2 to your actual app (Python or otherwise). Commit a .dockerignore.
Record image sizes before and after multi-stage. Target: ≤ 250 MB for a Python web app, ≤ 80 MB for a Go binary.
Add a HEALTHCHECK that hits a real /healthz endpoint and returns JSON including git SHA + uptime.

Local dev loop

Write a docker-compose.yml for app + Postgres + Redis with health-gated startup (§3). Verify docker compose up --watch hot-reloads on file change without full rebuild.

Security pass

Run trivy image against your image; get HIGH/CRITICAL count to zero (update base image / pin versions as needed).
Confirm with docker run --rm myapp whoami that the process is NOT root.
Add --read-only + tmpfs /tmp to your docker run command and verify the app still boots.

Ship to ACR

Create an ACR, push a tagged image built remotely with az acr build (no local Docker needed).
Tag with the git short SHA; never ship :latest to production.

Deploy — pick ONE path to start, then do the other

App Service path: provision Linux plan + webapp, wire managed-identity AcrPull, set WEBSITES_PORT, deploy, hit the public URL.
Add a staging slot, deploy v2 to it, verify, then swap → confirm zero downtime.
Container Apps path: provision an environment, deploy with --min-replicas 0, confirm scale-to-zero after idle, confirm cold-start behaviour.
Create a second revision, split traffic 50/50 between revisions, then promote to 100%.

Observability

Enable App Insights (SDK in code + connection string in env).
Write 3 KQL queries in Log Analytics: error rate by endpoint, p95 latency, slow DB calls.
Wire one alert: HTTP 5xx > 1% over 5 min → email/Teams.

Stretch (only after the above)

Migrate the deploy to Bicep or Terraform so the whole stack is azd up-able.
Add GitHub Actions: build → scan → push to ACR → az containerapp update --image … on merge to main.
Compare cost for your actual traffic pattern: App Service B1 24×7 vs Container Apps scale-to-zero. Document the break-even point.

When every box above is checked, flip status: workinprogress → status: published in the frontmatter.

Mid-article nudge

Liked this so far? Subscribe and the next deep dive lands in your inbox Monday.

Related concepts

← previous

Orchestrating ML Pipelines with Azure Data Factory

Audio to Video Generation Using Replit AI and Deploy as an Azure Webapp