system designintermediate 30m read

MS Stack Ch 19 — IaC and rollout orchestration

Bicep modules, what-if previews, environments, region rollouts. Awareness of Ev2 (Microsoft internal), Spinnaker, ArgoCD. The pattern for safely deploying infrastructure + apps across regions.

Chapter 19 of From Novice to Fluent on the Modern Microsoft Web Stack — a 22-chapter self-study plan.

Why this chapter

The pipeline from chapter 18 deploys code. Infrastructure — the App Service, the database, the storage account, the network — has to be deployable too, and from the same source-of-truth model: declarative, reviewable, version-controlled, repeatable. That is what Infrastructure-as-Code (IaC) means in practice. Inside Azure, the modern primitive is Bicep; underneath it is ARM (Azure Resource Manager) and its JSON templates.

But IaC is only half the story. The other half is rollout orchestration: how do you safely apply infrastructure changes and code deployments across regions, with bake times, approvals, and automatic rollback when health drops? Inside Microsoft, the answer is Ev2 (Express v2). Outside, the comparable tools are Spinnaker, Argo Rollouts, Octopus Deploy, and GitHub Actions environments chained with custom logic. The shapes differ; the principles are nearly identical.

Shipping-grade looks like "I can author a Bicep file that defines an App Service plan, a web app, and a Key Vault, and deploy it idempotently through a pipeline". Expert-tier looks like "I can split infra from app deploys, design a dev → preprod → canary → broad → global rollout with bake times, and recover when the canary region fails health checks".

You finish this chapter when you can describe — without notes — the full deploy plan for a multi-region web app: where IaC stops and code deploy begins, what each rollout stage proves, and how the system rolls back when a stage fails.

ARM + Bicep

Resources, parameters, outputs, scopes, modules — the declarative core.

Idempotency

Convergence: applying the same template twice yields the same state.

Infra ≠ code

Why infra and app deploys are separate pipelines with separate cadences.

Region rollout

Dev → preprod → canary → broad → global, with bake times and rollback.

Approvals + change

Change tickets, business-hours gates, deployment-freeze windows.

Ev2 + analogues

The Microsoft-internal orchestrator and the Spinnaker / Argo / Octopus equivalents.

Concepts and depth

ARM templates and Bicep

ARM (Azure Resource Manager) is the API layer for all Azure resource operations. Every CLI call, every portal click, every Terraform apply ends up as an ARM request. ARM accepts declarative templates in JSON: a top-level object listing resources, parameters, variables, outputs. Templates are idempotent — submit the same template twice and the second submission is a no-op (assuming no drift).

JSON ARM templates are correct, complete, and miserable to write. The schema is verbose, the dependency model is implicit (dependsOn arrays you have to remember), and the syntax for expressions ([concat(...)], [reference(...)]) is its own dialect. Bicep is a transpiler: a domain-specific language that compiles to ARM JSON, while looking like a sensible Python-meets-TypeScript file. Same semantics, dramatically better author experience.

// main.bicep
param location string = resourceGroup().location
param appName string
param sku string = 'P1v3'
 
resource plan 'Microsoft.Web/serverfarms@2024-04-01' = {
  name: '${appName}-plan'
  location: location
  sku: { name: sku }
  properties: { reserved: true } // Linux
}
 
resource app 'Microsoft.Web/sites@2024-04-01' = {
  name: appName
  location: location
  properties: {
    serverFarmId: plan.id
    httpsOnly: true
    siteConfig: {
      linuxFxVersion: 'DOTNETCORE|8.0'
      minTlsVersion: '1.2'
    }
  }
}
 
output appUrl string = 'https://${app.properties.defaultHostName}'

Compile Bicep with bicep build main.bicep (emits main.json) or let az deployment consume Bicep directly. Modules (module x './kv.bicep' = { ... }) are the unit of reuse — your KV definition, your App Service definition, each in its own file, composed in the root template.

Deployment scopes matter: resourceGroup (most common), subscription (for things that span RGs, like assigning roles or creating RGs themselves), managementGroup, and tenant. The az deployment <scope> create command must match the template's targetScope. Mismatches yield an unhelpful error.

# Resource-group scope
az deployment group create \
  --resource-group rg-myapp-prod \
  --template-file main.bicep \
  --parameters appName=myapi-prod
 
# Subscription scope (e.g., creates RGs)
az deployment sub create \
  --location eastus2 \
  --template-file subscription.bicep

The what-if preview (az deployment group what-if ...) shows what ARM would change before you apply. It is the IaC equivalent of git diff and should be the default mode in PR review: post the what-if as a PR comment, require it to be clean (or explicitly approved) before merge.

Good enough to ship

• Bicep file per service; modules for cross-cutting (KV, log workspace)
• az deployment group create from a pipeline with federated identity
• what-if posted on every PR
• Outputs consumed by downstream stages (URL, connection string ref)

Expert tier

• Bicep registry hosting shared modules with semver
• Subscription-scoped Bicep creates RGs + RBAC + policy assignments
• Drift detection on a schedule: re-run what-if against prod, alert if non-empty
• Microsoft.Authorization/policyAssignments enforces guardrails in code

Deploy infra, then deploy code

A single pipeline that deploys infra and code together feels efficient and turns out to be the wrong unit. Infrastructure changes shape (a new subnet, a new Key Vault, a new SKU); they happen weekly, sometimes monthly, and often need separate approval. Code changes happen many times a day. Coupling them means an emergency code rollback drags infra changes with it (or worse, cannot proceed because the infra step is also broken).

The pattern is two pipelines, one source-of-truth repo. The infra pipeline runs on infra/** changes, applies Bicep, produces outputs (resource IDs, URLs) as artifacts. The app pipeline runs on src/** changes, consumes the outputs (or names them by convention), deploys the app code into the existing infra. The two share variable groups and service connections but run independently.

The "infra first" rule is enforced by ordering: the app pipeline's deploy stage does not start until the infra pipeline has completed at least once successfully for that environment. New environments require an infra deploy before any app deploy can target them.

# app pipeline declares the infra pipeline as a resource
resources:
  pipelines:
  - pipeline: infra
    source: Infra.Bicep
    trigger:
      branches: { include: [ main ] }
 
stages:
- stage: Deploy
  jobs:
  - deployment: DeployApp
    environment: prod
    strategy:
      runOnce:
        deploy:
          steps:
          - download: infra
            artifact: outputs
          - script: cat $(Pipeline.Workspace)/infra/outputs/appUrl.txt
          - task: AzureWebApp@1
            inputs: { ... }

Good enough to ship

• Separate infra and app pipelines
• infra/ and src/ in the same repo, separately triggered
• App pipeline waits for infra to succeed before deploying

Expert tier

• Infra and app pipelines in separate repos, app declares infra as a pipelines: resource
• Infra changes go through a heavier change-management process than code
• Per-environment promotion: infra-dev → infra-staging → infra-prod, each gated

Idempotency and convergence

The IaC promise is "the template describes desired state; apply it as many times as you want and the result is the same". This property is idempotency, and the achievement of it is convergence. ARM achieves convergence by treating each resource as an upsert: if the resource exists with matching properties, no-op; if it exists with different properties, update; if it does not exist, create.

The gotcha is that not every property is converged. Some resources have PUT-only behaviour where changing a property requires recreate, which ARM will do — and recreate may drop data (a storage account name change, for instance). Read the resource provider docs; the what-if preview will tell you when a change is destructive.

Some operations are inherently out-of-band: secret values in Key Vault, role assignments at non-managed scopes, data-plane operations like uploading a blob. These should not be in Bicep. Use scripts (or specialised tasks) to manage them, but accept that those operations are not idempotent in the same way.

The third gotcha is drift: someone clicks in the portal and changes a setting. Your Bicep is now out of sync with reality. The fix is twofold — disable portal write access in prod (Azure Policy can do this), and schedule a periodic what-if run that alerts on non-empty diffs.

Good enough to ship

• Templates are safe to re-run; verified by re-running in CI
• Destructive changes (rename, SKU change) reviewed manually
• No portal edits in prod (policy enforced)

Expert tier

• Nightly drift scan; deltas open PRs automatically
• Out-of-band operations (secrets, blobs) clearly partitioned from Bicep
• Recreate-on-change properties flagged in code review checklist

Region rollout patterns

A single-region prod deploy is a special case. The real shape of a safe deploy is a staged rollout across multiple regions or rings:

Dev — your own team's environment, single region, no real users. Bake: minutes.
Preprod — production-like data and config, internal testers only. Bake: hours.
Canary — one production region, small fraction of real users. Bake: hours to a day.
Broad — a wave of regions, larger user fraction. Bake: hours.
Global — remaining regions. Bake: continuous monitoring.

Between each stage is a bake time: a window during which the deploy must not break the SLO. "Not break" is defined by quantitative health signals — error rate stays under a threshold, p95 latency stays under another, no critical alerts fire. If any signal breaches, the rollout halts and an alert wakes the on-call.

Auto-rollback is the response to a bad bake. It comes in two flavours. Slot-swap rollback, available in App Service, swaps the old version back; it takes seconds. Redeploy rollback, the general form, re-runs the deploy stage against the previous artifact; it takes as long as a normal deploy. The first is preferable; the second is the fallback.

The fan-out shape is usually wave-based: canary in one region, wave 1 in two non-paired regions, wave 2 in four, and so on. Pairing matters because Azure region pairs share some failure domains; spreading waves across non-paired regions limits blast radius. The exact wave shape depends on your customer geography and your SLO commitments.

Linear waves

simple

•Canary → wave1 → wave2
•Easy to reason about
•Slower full rollout

Exponential waves

fast at scale

•1 → 2 → 4 → 8 regions
•Reaches global in 4 waves
•Bigger blast radius per wave

Per-customer rings

Microsoft-style

•Ring 0 internal, Ring 1 early adopters, Ring 2 broad
•Customer opt-in to early rings
•Most signal before broad

Good enough to ship

• Dev → preprod → canary → broad → global
• 1-hour bakes minimum at canary; longer for prod waves
• Health gate: error rate < baseline + N basis points
• Slot-swap rollback wired in

Expert tier

• Customer-aware rings, not just regions
• Bake times derived from traffic volume (need N requests for stat-sig)
• Auto-rollback drives a postmortem regardless of cause
• Rollout-engine signals streamed to a central dashboard

Approval workflows and change management

Production deploys in regulated industries (finance, healthcare, anything touching customer money) require a change ticket: a record stating what is changing, why, who approved it, what the rollback plan is. The pipeline integrates by querying the change-management system before the prod stage runs; if the ticket is not approved or not in the deploy window, the gate blocks.

Even outside regulated contexts, the discipline is valuable. The change record is the post-incident artifact; the approval is the second pair of eyes; the deploy window prevents Friday-evening surprises. Azure DevOps environments support REST gates that call the change-management API; ServiceNow, Jira Service Management, and most enterprise tools have webhooks for this.

The lighter-weight version is the deploy calendar + freeze windows. Code freezes are a fact of life: Black Friday, end of quarter, regulatory deadlines. The pipeline must respect a freeze — typically by checking an org-wide "deploy allowed?" service before each prod stage.

Approvals are not just "click OK". They are an opportunity for a second engineer to inspect the deploy: what changed in the diff, what's in the artifact, what's the rollback plan. Treating approvals as paperwork is how you ship the wrong thing.

Good enough to ship

• Manual approval on every prod stage
• Business-hours check enforced
• Freeze windows respected via a REST gate

Expert tier

• Change-management ticket required, status verified by gate
• Approver rota documented; conflict-of-interest rules
• Auto-rollback skips approvals (correct: getting back to known-good is always allowed)

Ev2 (Microsoft-internal) in depth

Inside Microsoft, the canonical rollout orchestrator is Ev2 (Express v2). It is not in the public docs, but the concepts are: most large cloud orgs end up with something like it, and understanding it makes the analogues clearer.

An Ev2 rollout is described by a small set of YAML and JSON files in a service group:

ServiceModel — the declarative description of the service: which regions it lives in, which roles (sub-services) it has, how they relate. The shape of the world.
RolloutSpec — the rollout plan: which stages run, in what order, what each stage does, what gates exist between them.
ScopeBindings — the mapping from logical names (primaryRegion) to physical Azure subscriptions and resource groups.
Parameters — per-environment parameter files (prod parameters, preprod parameters).
Templates — ARM templates (.json) that each stage applies.
Scripts — PowerShell or Bash scripts that each stage runs for non-ARM operations (warmup, drain, custom checks).

A rollout consumes a build artifact (the pipeline output from chapter 18) and a rollout spec, then walks through the stages defined in the spec, applying ARM templates per scope and running scripts per scope. Each stage has a bake time and a health gate; failure halts the rollout and (depending on config) triggers a rollback.

Two flavours exist: Classic Ev2 (older, more verbose, ARM-only) and Modern Ev2 (newer, supports container images and Helm, slightly nicer authoring). New services start on Modern. The migration story is well-trodden.

The transferable lesson is the separation between service model and rollout plan. The model says what exists; the spec says how to change it. The same model can be rolled out by different specs for different scenarios (normal release, emergency patch, regional failover). Argo Rollouts (Kubernetes) makes the same split with its Rollout CRD vs AnalysisTemplate CRD; Spinnaker does it with its pipeline + canary-analysis stages.

Good enough to ship

• Inside MS: know ServiceModel, RolloutSpec, ScopeBindings, Parameters, Templates, Scripts
• Outside MS: pick one orchestrator (Spinnaker / Argo / Octopus) and learn it deeply
• Separate "what exists" from "how we change it" in your design

Expert tier

• Author both Classic + Modern Ev2 specs; understand migration tradeoffs
• Custom health checks integrated with the rollout engine
• Emergency-rollout specs that skip non-safety gates

Transferable analogues outside Microsoft

If you are not at Microsoft, the same problems are solved by:

Spinnaker — Netflix-origin multi-cloud orchestrator. Pipelines with stages, canary analysis via Kayenta, deep AWS/GCP/Kubernetes integration. Heavy operationally; runs as a cluster of services.
Argo Rollouts — Kubernetes-native progressive delivery. Rollout CRD replaces Deployment; supports blue-green, canary, traffic-splitting via service meshes. Lightweight if you are already on Kubernetes.
Octopus Deploy — commercial tool popular in .NET shops. Project-based UI, strong environment promotion model, good story for Windows + Linux + container deploys. Less programmable than Spinnaker, faster to learn.
GitHub Actions environments — for smaller orgs or single-product setups, environments with required reviewers, deployment branches, and custom workflows do most of what you need. Less specialised than Spinnaker but plenty for a 5–20 service org.
Flux + Argo CD — GitOps. The desired state lives in Git; the cluster pulls and converges. Rollout patterns layer on top (Argo Rollouts on top of Argo CD).

Pick one, learn it well, and remember that the principles travel: separate model from rollout, bake before broaden, gate on health, automate the rollback.

Spinnaker

multi-cloud, heavy

•Netflix origin
•Kayenta canary analysis
•Multi-account AWS / GCP

Argo Rollouts

k8s-native

•Rollout CRD
•Traffic split via Istio/Linkerd
•Pairs with Argo CD

Octopus Deploy

.NET friendly

•Project + env model
•Windows + Linux + container
•Commercial support

Worked examples

A Bicep module with parameters and outputs

// modules/webapp.bicep
@description('Name of the App Service.')
param appName string
 
@description('Azure region.')
param location string = resourceGroup().location
 
@description('App Service plan SKU.')
@allowed([ 'B1', 'P1v3', 'P2v3' ])
param sku string = 'P1v3'
 
@description('App settings to inject (key-value pairs).')
param appSettings object = {}
 
resource plan 'Microsoft.Web/serverfarms@2024-04-01' = {
  name: '${appName}-plan'
  location: location
  sku: { name: sku }
  properties: { reserved: true }
}
 
resource app 'Microsoft.Web/sites@2024-04-01' = {
  name: appName
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    serverFarmId: plan.id
    httpsOnly: true
    siteConfig: {
      linuxFxVersion: 'DOTNETCORE|8.0'
      minTlsVersion: '1.2'
      ftpsState: 'Disabled'
      appSettings: [ for setting in items(appSettings): {
        name: setting.key
        value: setting.value
      } ]
    }
  }
}
 
output appUrl string = 'https://${app.properties.defaultHostName}'
output principalId string = app.identity.principalId

What to notice:

@description and @allowed are decorators that improve self-documentation and validate at compile time.
identity: { type: 'SystemAssigned' } gives the app a managed identity; the principalId output lets a downstream KV module grant it access.
appSettings is an object (key/value map) and converted to ARM's array shape via a for ... in items() loop.
Two outputs: the public URL (for smoke tests) and the principal ID (for RBAC).

Composing modules in a root template

// main.bicep
targetScope = 'resourceGroup'
 
param env string  // 'dev' | 'staging' | 'prod'
param location string = resourceGroup().location
 
var appName = 'myapi-${env}'
 
module kv './modules/keyvault.bicep' = {
  name: 'kv-deploy'
  params: {
    name: 'kv-myapp-${env}'
    location: location
  }
}
 
module web './modules/webapp.bicep' = {
  name: 'web-deploy'
  params: {
    appName: appName
    location: location
    sku: env == 'prod' ? 'P2v3' : 'P1v3'
    appSettings: {
      'ConnectionStrings__Default': '@Microsoft.KeyVault(SecretUri=${kv.outputs.connStrUri})'
      'ApplicationInsights__InstrumentationKey': '@Microsoft.KeyVault(SecretUri=${kv.outputs.aiKeyUri})'
    }
  }
}
 
// Grant the web app's managed identity Key Vault Secrets User
module rbac './modules/keyvault-rbac.bicep' = {
  name: 'kv-rbac'
  params: {
    kvName: kv.outputs.name
    principalId: web.outputs.principalId
    roleDefinitionId: '4633458b-17de-408a-b874-0445c86b69e6' // Key Vault Secrets User
  }
}
 
output appUrl string = web.outputs.appUrl

What to notice:

targetScope declared explicitly; az deployment group create matches it.
Conditional SKU based on env: prod gets bigger.
App settings use Key Vault references (@Microsoft.KeyVault(...)) — the actual secret value never appears in Bicep or pipeline logs.
The RBAC module depends on outputs from KV (the resource ID) and the web app (the principal ID); Bicep infers the dependency graph automatically.

A pipeline that deploys infra then app

# infra pipeline (infra-pipelines.yml)
trigger:
  branches: { include: [ main ] }
  paths: { include: [ 'infra/**' ] }
 
stages:
- stage: WhatIf
  jobs:
  - job: WhatIfProd
    steps:
    - task: AzureCLI@2
      inputs:
        azureSubscription: 'svc-conn-prod-fed'
        scriptType: bash
        scriptLocation: inlineScript
        inlineScript: |
          az deployment group what-if \
            --resource-group rg-myapp-prod \
            --template-file infra/main.bicep \
            --parameters infra/prod.bicepparam
 
- stage: Deploy_Prod
  dependsOn: WhatIf
  jobs:
  - deployment: DeployProd
    environment: prod-infra
    strategy:
      runOnce:
        deploy:
          steps:
          - task: AzureCLI@2
            inputs:
              azureSubscription: 'svc-conn-prod-fed'
              scriptType: bash
              scriptLocation: inlineScript
              inlineScript: |
                az deployment group create \
                  --resource-group rg-myapp-prod \
                  --template-file infra/main.bicep \
                  --parameters infra/prod.bicepparam \
                  --query properties.outputs > outputs.json
          - publish: outputs.json
            artifact: outputs

What to notice:

Path filter limits to infra/** changes; code changes do not trigger this pipeline.
what-if runs first as a separate stage; the deploy stage depends on it succeeding.
The deploy stage publishes the ARM outputs as an artifact for the app pipeline to consume.
A dedicated prod-infra environment carries its own approvals (typically tighter than app deploys).

A canary + waves rollout

- stage: Canary
  jobs:
  - deployment: CanaryEastUS2
    environment: prod-eastus2
    strategy:
      runOnce:
        deploy:
          steps:
          - task: AzureWebApp@1
            inputs: { azureSubscription: 'svc-conn-prod-fed', appName: 'myapi-eastus2', package: $(Pipeline.Workspace)/api }
          - script: ./scripts/smoke.sh https://myapi-eastus2.azurewebsites.net
            displayName: Smoke
          - script: ./scripts/bake-check.sh eastus2 3600
            displayName: Bake (60min, error rate < baseline + 50bps)
 
- stage: Wave1
  dependsOn: Canary
  condition: succeeded()
  jobs:
  - deployment: WestEurope
    environment: prod-westeurope
    strategy: { runOnce: { deploy: { steps: [ ... ] } } }
  - deployment: SouthAsia
    environment: prod-southasia
    strategy: { runOnce: { deploy: { steps: [ ... ] } } }
 
- stage: Wave2
  dependsOn: Wave1
  condition: succeeded()
  jobs:
  - deployment: AustraliaEast
    environment: prod-australiaeast
    strategy: { runOnce: { deploy: { steps: [ ... ] } } }
  - deployment: WestUS3
    environment: prod-westus3
    strategy: { runOnce: { deploy: { steps: [ ... ] } } }

What to notice:

Canary runs a smoke test, then a bake script that queries telemetry and exits non-zero if SLO breaches.
Waves are parallel jobs within a stage; the stage gate is "all waves succeed".
condition: succeeded() on wave stages halts the rollout if canary fails.
Each region has its own environment, giving per-region deployment history.

Hands-on exercises

Goal: Write a Bicep module for App Service + plan. Steps: (1) Author modules/webapp.bicep with parameters for name, location, SKU. (2) Compose it from a root main.bicep. (3) Deploy to a personal Azure subscription. (4) Re-deploy unchanged; verify no-op. You're done when az deployment group create runs twice with identical output.
Goal: Use what-if in a PR workflow. Steps: (1) Make a Bicep change in a feature branch. (2) Run az deployment group what-if locally; capture the output. (3) Configure a pipeline to run it on PR and post the diff as a comment. You're done when PR comments show the diff before merge.
Goal: Split infra and app pipelines. Steps: (1) Move infra into its own pipeline with paths: include: ['infra/**']. (2) Make the app pipeline declare the infra pipeline as a pipelines: resource. (3) Trigger an infra change; verify the app pipeline waits. You're done when an infra deploy precedes the next app deploy.
Goal: Demonstrate drift detection. Steps: (1) Deploy infra with Bicep. (2) Manually edit a property in the portal. (3) Run what-if; observe the drift. (4) Either revert in portal or update Bicep + redeploy. You're done when drift is detected and resolved.
Goal: Build a canary stage with bake. Steps: (1) Deploy to one region in a new stage called Canary. (2) Add a script that queries Application Insights for error rate over the last 60 minutes and exits non-zero on breach. (3) Make the next stage dependsOn: Canary with condition: succeeded(). (4) Force a failure (deploy a bad version); verify the next stage does not run. You're done when canary failures halt the rollout.
Goal: Implement an auto-rollback. Steps: (1) Capture the previous artifact's version (e.g., from a tag or release name). (2) On smoke-test failure post-deploy, run a re-deploy of the previous artifact. (3) Optionally use slot-swap for instant rollback. You're done when a deliberately broken deploy triggers a rollback within minutes.

Self-check questions

Why use Bicep instead of writing ARM JSON directly? Give two concrete advantages.
What does targetScope do, and what breaks if the CLI call's scope does not match?
Explain idempotency in the context of ARM deployments. Name a scenario where a Bicep apply is NOT idempotent.
Why split infra and app deploys into separate pipelines? Give a failure mode that occurs when they share one.
What is az deployment ... what-if, and where in the workflow should it run?
Walk through a dev → preprod → canary → broad → global rollout. What does each stage prove?
What is a bake time, and how do you decide its duration?
Compare slot-swap rollback vs redeploy rollback. When does each apply?
Inside Microsoft, what are ServiceModel, RolloutSpec, ScopeBindings, Parameters, Templates, and Scripts in Ev2? One sentence each.
Outside Microsoft, name an analogue for each of: progressive delivery on Kubernetes, multi-cloud orchestrator, .NET-shop deployment tool.
What is drift, and how do you detect and prevent it?
Why does an auto-rollback skip the manual approval gate?

High-signal resources

Official docs

Bicep documentation — language, modules, registry.
ARM template reference — every resource type's schema.
What-if for deployments.
Azure Policy — enforce guardrails declaratively.
Argo Rollouts — progressive delivery on Kubernetes.
Spinnaker docs — multi-cloud orchestration.

Books or courses

Infrastructure as Code — Kief Morris. The canonical text; principles outlive tools.
Cloud Native Patterns — Cornelia Davis. Progressive delivery and rollout patterns in depth.

Practitioner posts

Netflix Tech Blog on Spinnaker — the origin story and continued iteration.
Honeycomb's deploy posts — production rollout incidents.
Microsoft Tech Community: Bicep — tips and patterns.
SRE Workbook chapter on canarying — Google's framing, broadly applicable.

Weekly milestones

Day 1: Read Bicep basics; author a single-resource Bicep file; deploy it. Answer self-check 1, 2, 3.
Day 2: Refactor into modules; add Key Vault + RBAC; deploy. Answer self-check 11.
Day 3: Wire Bicep into a pipeline with what-if on PR; deploy on merge. Answer self-check 5.
Day 4-5: Split infra and app pipelines; chain with pipelines: resource. Answer self-check 4.
Day 6-7: Build a canary + waves stage layout with bake + auto-rollback. Read about Ev2 (internally) or Argo/Spinnaker (externally). Answer self-check 6, 7, 8, 9, 10, 12.

How it shows up in the capstone

The capstone repo has an infra/ directory with Bicep modules (App Service, Key Vault, App Insights, log analytics workspace) composed from a main.bicep. A dedicated infra pipeline triggers on infra/** changes, runs what-if on PR, and applies on merge. The app pipeline declares the infra pipeline as a resource and consumes its outputs as artifacts.

The app's prod deploy is shaped as a canary + waves rollout: one region first with a 30-minute bake, then a wave of two non-paired regions, then global. Bake scripts query Application Insights for error rate and p95 latency. Any breach halts the rollout and triggers an auto-rollback via slot swap.

If you are at Microsoft, the capstone's rollout becomes a thin Ev2 spec; if you are elsewhere, it stays in Azure DevOps with environments and gates. The principles — separate model from rollout, bake before broaden, automate the rollback — are identical.

Previous chapter → Ch 18 — CI/CD with Azure DevOps
Next chapter → Ch 20 — Security baseline