Search Tech Journey

Find topics, journeys and posts

back to blog
cloudintermediate 11m2024-08-28

Deploying Web Applications in Azure with Docker

A self-sufficient, production-minded walkthrough — from Docker internals to a hardened deploy on Azure App Service / Container Apps.

Deploying Web Applications in Azure with Docker

What you'll leave with: a correct mental model of containers (not just the commands), a secure production-grade Dockerfile pattern, a working local dev loop with Docker Compose, and two deployment paths on Azure — App Service (classic) and Container Apps (modern, scale-to-zero) — with the exact CLI commands, gotchas, and cost/architecture tradeoffs.


1. A correct mental model: containers are not VMs

A very common misunderstanding is that containers are "lightweight VMs". They are not. A VM virtualises hardware (its own kernel, bootloader, emulated devices). A container virtualises the OS view for a group of processes — they all share the host kernel.

Three kernel features do the work:

FeatureWhat it gives youExample
NamespacesIsolated view of resourcespid (its own PID 1), net (its own NIC), mnt (its own filesystem root), uts (its own hostname), ipc, user
cgroups v2Resource limits & accounting--memory=512m, --cpus=1.5
Union FS (overlayfs)Layered, copy-on-write imagesEach RUN/COPY in a Dockerfile = one read-only layer; container gets a thin R/W layer on top

Consequences that matter:

  • Kernel features the host lacks, you can't get inside the container. (e.g. newer io_uring features, newer eBPF, modern seccomp profiles.)
  • You cannot run Linux containers on Windows/macOS natively — Docker Desktop runs a tiny Linux VM for you; that's why file-sync can be slow.
  • Image size is dominated by layers, so ordering COPY and RUN correctly is a real perf/cost lever.

The runtime stack on Linux: docker CLI → dockerd → containerd → runc. runc is the OCI reference that actually calls clone(), sets up the namespaces and cgroups, pivot_roots the filesystem, and execves your entrypoint. Knowing this lets you debug "why is my container dying with exit 137?" (answer: OOMKilled by the memory cgroup).


2. A production-grade Dockerfile pattern

The Dockerfile most tutorials show you (including the earlier version of this post) has at least four production problems:

  1. Runs as root.
  2. Uses COPY . /app before pip install, so every code change busts the pip cache.
  3. Uses a single stage — your build tools (gcc, dev headers, pip wheel cache) ship to production.
  4. No HEALTHCHECK, no EXPOSE discipline, no explicit WORKDIR.

Here is a pattern that fixes all of this for a Python/Flask app (the same shape works for Node/Go/Java):

# syntax=docker/dockerfile:1.7
# ---------- Stage 1: builder ----------
FROM python:3.12-slim AS builder

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1 \
    PIP_NO_CACHE_DIR=1

WORKDIR /build

# Install only build-time OS deps, then drop them
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential gcc \
    && rm -rf /var/lib/apt/lists/*

# Copy ONLY dependency manifests first so this layer is cached on code-only changes
COPY requirements.txt .
RUN pip install --prefix=/install -r requirements.txt

# ---------- Stage 2: runtime ----------
FROM python:3.12-slim AS runtime

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PORT=8000

# Non-root user
RUN useradd --create-home --shell /usr/sbin/nologin --uid 10001 app
WORKDIR /app

# Bring in only the built site-packages — no gcc, no caches
COPY --from=builder /install /usr/local

# App code last = best cache behaviour
COPY --chown=app:app . .

USER app
EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8000/healthz', timeout=2).status==200 else 1)"

# Use gunicorn, not `python app.py`, in production
CMD ["gunicorn", "-b", "0.0.0.0:8000", "-w", "2", "--threads", "4", "app:app"]

Pair it with a strict .dockerignore:

.git
.venv
__pycache__
*.pyc
.env
.env.*
tests/
docs/
node_modules/

Why this matters (concrete numbers):

  • Python base image: python:3.12 ≈ 1.0 GB, python:3.12-slim ≈ 130 MB, multi-stage + slim + --no-cache-dir typically lands a Flask app around 170–220 MB.
  • Proper layer order: on a code-only edit, rebuild time drops from ~90s to ~5s because pip install is cached.
  • Non-root user: blocks ~90% of common image-level CVE exploit paths.

3. Local dev loop with Docker Compose

Real apps have more than one process. A docker-compose.yml lets you bring up the whole system with one command and a consistent network.

services:
  web:
    build: .
    ports: ["8000:8000"]
    environment:
      DATABASE_URL: postgresql://app:app@db:5432/app
      REDIS_URL: redis://cache:6379/0
    depends_on:
      db: { condition: service_healthy }
      cache: { condition: service_started }
    develop:
      watch:
        - action: sync
          path: ./src
          target: /app
        - action: rebuild
          path: ./requirements.txt

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: app
      POSTGRES_DB: app
    volumes: ["pgdata:/var/lib/postgresql/data"]
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app"]
      interval: 5s
      retries: 5

  cache:
    image: redis:7-alpine

volumes:
  pgdata:

Key points most tutorials get wrong:

  • Service name = DNS name. web reaches Postgres at host db, not localhost. Compose puts every service on a user-defined bridge network where Docker's embedded DNS resolves service names.
  • depends_on alone does not wait for readiness. Use condition: service_healthy with a healthcheck.
  • Named volumes, not bind mounts, for databases. Bind mounts on macOS/Windows are slow and can corrupt Postgres WAL on some setups.
  • develop.watch (Compose v2.22+) replaces the old docker-compose up --build loop with fast file sync — much closer to a native dev experience.

Run: docker compose up --watch.


4. Security hardening checklist

Production containers need more than "it works":

  • Non-root user (USER directive in Dockerfile; runAsNonRoot: true in Kubernetes).
  • Minimal base image-slim, distroless, or chainguard images. Avoid -alpine for Python (musl libc compatibility issues with many wheels).
  • Pin versionsFROM python:3.12.5-slim@sha256:<digest>, not python:latest.
  • No secrets in layers. Anything COPYed is recoverable via docker history. Inject secrets via env vars / Azure Key Vault at runtime.
  • Scan on every builddocker scout cves <image>, trivy image <image>, or grype <image>.
  • Read-only root filesystem at runtime — --read-only + a tmpfs mount for /tmp if needed.
  • Drop capabilities--cap-drop=ALL --cap-add=NET_BIND_SERVICE (if you need < 1024 ports).
  • Resource limits — always set --memory and --cpus (or their Compose/K8s equivalents). Without them, one runaway container can OOM the whole host.

A fast CI gate, in one line:

trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:${GIT_SHA}

5. Pushing the image: Azure Container Registry (ACR)

Prefer ACR over Docker Hub for Azure deploys — it's inside your VNET, supports managed-identity pulls, and avoids Docker Hub's public rate limits.

# One-time setup
az group create -n rg-blog -l eastus
az acr create  -n myblogacr -g rg-blog --sku Basic

# Every build
az acr login -n myblogacr
docker build -t myblogacr.azurecr.io/blog:$(git rev-parse --short HEAD) .
docker push     myblogacr.azurecr.io/blog:$(git rev-parse --short HEAD)

# Or build in the cloud (no local Docker needed)
az acr build -r myblogacr -t blog:$(git rev-parse --short HEAD) .

az acr build is a sleeper feature — it runs the build on ACR's build agents, which is often faster than your laptop and doesn't require Docker locally (great for CI runners).


6. Deployment path A — Azure App Service (classic PaaS)

Best for: a single long-running web app, simple scaling, 1+ instances always on, mature blue/green via deployment slots, custom domain + managed SSL in one click.

# Linux App Service plan (P1V3 recommended for prod; B1 for dev)
az appservice plan create -g rg-blog -n plan-blog --is-linux --sku B1

# Web app that pulls from ACR using a system-assigned managed identity
az webapp create \
  -g rg-blog -p plan-blog -n myblog-app \
  --deployment-container-image-name myblogacr.azurecr.io/blog:latest

# Give the webapp a managed identity and AcrPull on the registry
az webapp identity assign -g rg-blog -n myblog-app
APP_PRINCIPAL_ID=$(az webapp identity show -g rg-blog -n myblog-app --query principalId -o tsv)
ACR_ID=$(az acr show -n myblogacr --query id -o tsv)
az role assignment create --assignee $APP_PRINCIPAL_ID --role AcrPull --scope $ACR_ID

# Tell App Service to use MI for pulls (no registry username/password stored)
az webapp config set -g rg-blog -n myblog-app --generic-configurations '{"acrUseManagedIdentityCreds": true}'

# App settings — surface as env vars inside the container
az webapp config appsettings set -g rg-blog -n myblog-app --settings \
  WEBSITES_PORT=8000 \
  WEBSITES_ENABLE_APP_SERVICE_STORAGE=false \
  DATABASE_URL="@Microsoft.KeyVault(...)"

Gotchas that catch people:

  • WEBSITES_PORT — App Service will only route traffic to the port you declare here. If your container listens on 8080, set WEBSITES_PORT=8080.
  • Always-On — enable it on any non-Free tier, otherwise the first request after idle takes ~20s.
  • Deployment slots give you true zero-downtime: deploy to staging, test, then az webapp deployment slot swap — DNS doesn't change, and the slot warms up before swap.
  • Log streaming: az webapp log tail -g rg-blog -n myblog-app.

7. Deployment path B — Azure Container Apps (modern, serverless)

Best for: microservices, event-driven workloads, anything that benefits from scale-to-zero, Dapr, or KEDA-based scaling (e.g., scale on queue depth).

az provider register -n Microsoft.App
az provider register -n Microsoft.OperationalInsights

az containerapp env create \
  -g rg-blog -n cae-blog -l eastus

az containerapp create \
  -g rg-blog -n ca-blog \
  --environment cae-blog \
  --image myblogacr.azurecr.io/blog:latest \
  --registry-server myblogacr.azurecr.io \
  --registry-identity system \
  --target-port 8000 --ingress external \
  --min-replicas 0 --max-replicas 10 \
  --cpu 0.5 --memory 1Gi \
  --secrets "db-url=keyvaultref:https://kv-blog.vault.azure.net/secrets/db-url,identityref:system" \
  --env-vars "DATABASE_URL=secretref:db-url"

Why you'd pick this over App Service:

DimensionApp ServiceContainer Apps
Scale to zeroNo (B1+)Yes
Per-second billing when idleNoYes
Scale on queue / custom metricLimitedKEDA — yes
Multiple revisions / traffic-splitSlots (2)Revisions (N), % traffic split
Dapr sidecarsNoYes
VNET integrationRegional or private endpointInternal env or workload profiles
Min latency on cold startn/a (always on)~2–5s cold start

Rule of thumb: one user-facing monolith with steady traffic → App Service. Many small services or bursty/event-driven → Container Apps.


8. Observability you should turn on day one

  • Log Analytics workspace attached to the App Service / ACA environment — stream stdout/stderr to KQL.
  • Application Insights SDK inside the app for distributed tracing (opentelemetry-instrumentation-flask is three lines of code).
  • Container health probes — App Service uses HEALTHCHECK from your Dockerfile; ACA has explicit liveness / readiness / startup probes you should configure.
  • Alerts on: HTTP 5xx rate > 1%, p95 latency > 1s, memory > 80%, restart count > 0 in 10 min.

A minimal KQL query for "who's erroring right now":

AppServiceConsoleLogs
| where TimeGenerated > ago(15m)
| where ResultDescription has_any ("ERROR","Exception","Traceback")
| summarize count() by bin(TimeGenerated, 1m), _ResourceId
| render timechart

9. Common mistakes to avoid (corrections to the original post)

The original draft of this post was an AI-generated placeholder with several issues worth calling out:

  1. "Use python:3.8-slim" — 3.8 reached end-of-life in Oct 2024. Use a supported Python (3.11/3.12 at the time of writing).
  2. CMD ["python", "app.py"] in production — Flask's dev server is single-threaded and debug-mode by default. Use gunicorn (sync + threads, or gthread/uvicorn workers for async frameworks).
  3. Pushing to Docker Hub for Azure deploys — works, but you pay rate-limit tax and lose managed-identity pulls. Use ACR.
  4. "Day 1 / Day 2" fake learning plan — replace with concrete milestones tied to the repo you actually ship.
  5. No .dockerignore — without it, COPY . . leaks your .git, .env, and __pycache__ into the image.

10. TODO — self-sufficient action list

After reading the above, you should be able to tick each of these off without any extra reading. If a step feels unclear, re-read the section linked in parentheses.

Foundations

  • Install Docker Desktop (or Colima on macOS for a lighter alternative), run docker run hello-world, then explain in your own words what namespaces/cgroups did (§1).
  • Run docker inspect &lt;container> and find the PID, MountID, and cgroup path. Open /proc/&lt;pid>/ns/ on the host and show the container's namespaces.

Build a real image

  • Port the two-stage Dockerfile in §2 to your actual app (Python or otherwise). Commit a .dockerignore.
  • Record image sizes before and after multi-stage. Target: ≤ 250 MB for a Python web app, ≤ 80 MB for a Go binary.
  • Add a HEALTHCHECK that hits a real /healthz endpoint and returns JSON including git SHA + uptime.

Local dev loop

  • Write a docker-compose.yml for app + Postgres + Redis with health-gated startup (§3). Verify docker compose up --watch hot-reloads on file change without full rebuild.

Security pass

  • Run trivy image against your image; get HIGH/CRITICAL count to zero (update base image / pin versions as needed).
  • Confirm with docker run --rm myapp whoami that the process is NOT root.
  • Add --read-only + tmpfs /tmp to your docker run command and verify the app still boots.

Ship to ACR

  • Create an ACR, push a tagged image built remotely with az acr build (no local Docker needed).
  • Tag with the git short SHA; never ship :latest to production.

Deploy — pick ONE path to start, then do the other

  • App Service path: provision Linux plan + webapp, wire managed-identity AcrPull, set WEBSITES_PORT, deploy, hit the public URL.
  • Add a staging slot, deploy v2 to it, verify, then swap → confirm zero downtime.
  • Container Apps path: provision an environment, deploy with --min-replicas 0, confirm scale-to-zero after idle, confirm cold-start behaviour.
  • Create a second revision, split traffic 50/50 between revisions, then promote to 100%.

Observability

  • Enable App Insights (SDK in code + connection string in env).
  • Write 3 KQL queries in Log Analytics: error rate by endpoint, p95 latency, slow DB calls.
  • Wire one alert: HTTP 5xx > 1% over 5 min → email/Teams.

Stretch (only after the above)

  • Migrate the deploy to Bicep or Terraform so the whole stack is azd up-able.
  • Add GitHub Actions: build → scan → push to ACR → az containerapp update --image … on merge to main.
  • Compare cost for your actual traffic pattern: App Service B1 24×7 vs Container Apps scale-to-zero. Document the break-even point.

When every box above is checked, flip status: workinprogressstatus: published in the frontmatter.

Mid-article nudge

Liked this so far? Subscribe and the next deep dive lands in your inbox Monday.

Related concepts

AttentionRoPEKV cacheMixture of ExpertsTwo-tower ranking