system designintermediate 32m read

MS Stack Ch 16 — Resilience patterns with Polly v8

Retry with jittered backoff, timeout, circuit breaker, bulkhead, fallback, rate-limiter, hedging — composed into a single resilience pipeline. The patterns that keep your app alive when downstream services don't.

Chapter 16 of From Novice to Fluent on the Modern Microsoft Web Stack — a 22-chapter self-study plan.

Why this chapter

Production systems fail. Networks blip, downstreams degrade, queues back up, certificates expire, the SQL connection pool exhausts, a noisy neighbour starves your CPU, an EU region falls offline at 03:14 UTC because someone in another team rolled out a bad config. The difference between a 30-second blip and a 30-minute outage is whether your callers handle failure with grace. The difference between a 30-minute outage and a 30-hour incident is whether your retries are bounded, whether your timeouts are tight, whether your circuit breaker is configured at all.

Shipping-level resilience means: every outbound call has a timeout, every retry has jitter, every circuit breaker has a MinimumThroughput, and you never retry a POST without an idempotency key. Expert-level resilience means: you reason about your service's failure mode taxonomy, you size timeouts against percentiles rather than guesses, you have run a chaos exercise that deliberately fails the downstream you most rely on, you can articulate why hedging belongs on read paths only, and you know exactly which Polly exception means "the breaker is open" versus "the request itself timed out".

You finish this chapter when you can build a complete resilience pipeline from memory, defend the ordering of every layer, and explain the failure mode each one catches.

Classify failures

Transient vs persistent; retriable vs not; surface the cause.

Bound everything

Max attempts, max delay, sampling windows, break durations — nothing unbounded.

Compose deliberately

Outer-to-inner pipeline ordering you can defend at code review.

Idempotency-first

Never retry mutating calls without an idempotency key.

Pick the right pattern

Timeout / retry / breaker / bulkhead / hedge / fallback — each for its purpose.

Wire via HttpClient

`AddResilienceHandler` so every outbound call inherits the policy.

Concepts and depth

Failure taxonomy: transient vs persistent, retriable vs not

The first decision in any failure is "do I retry?". The classifier has two axes:

Transient — the same call again has a real chance of succeeding (a connection reset, a 503, a 429). The downstream is healthy or recovering.
Persistent — the same call again will fail the same way (a 400, a 404, a 401, a SQL constraint violation). The state, the request or the auth is wrong.

Cross with:

Retriable — the operation is safe to attempt again (idempotent GETs, idempotent PUTs with If-Match, POSTs with an idempotency key).
Non-retriable — repeating it would do harm (a POST that charges a card, a SendEmail call, an unscoped UPDATE).

Only (transient AND retriable) belongs in a retry policy. Everything else fails fast and surfaces the real cause. Retrying a 400 is wasted budget; retrying a non-idempotent POST is a double-charge incident waiting to happen.

The HTTP status set worth retrying: 408 (Request Timeout), 429 (Too Many Requests; honour Retry-After), 500 (Internal Server Error), 502 (Bad Gateway), 503 (Service Unavailable), 504 (Gateway Timeout). Plus network-layer faults: connection reset, DNS timeout, TCP RST, TLS handshake failure. Everything else, fail fast.

Good enough to ship

• Retry 5xx + 408 + 429 only.
• Never retry non-idempotent calls without an idempotency key.
• Surface the original exception in logs.

Expert tier

• Per-endpoint retry budgets based on actual failure-rate baselines.
• Read-vs-write retry policies live in code; reviewable.
• Replay logs of retried POSTs prove no double-execution.

Retry: max attempts, backoff strategies, jitter

A retry policy is parameterised by:

Max attempts — bounded; 3 is a sane default for fast downstreams, 5 for slow ones.
Backoff strategy — how the delay grows between attempts: constant, linear, exponential.
Base delay — the unit of delay (Delay = TimeSpan.FromSeconds(1)).
Jitter — random spread to avoid synchronised herd retries.

Backoff strategies:

Constant — same delay every time. Use only for very fast operations.
Linear — delay grows linearly (1s, 2s, 3s). Modest backoff.
Exponential — delay doubles (1s, 2s, 4s, 8s). The standard choice.

Jitter is the single most important knob. Without it, every client that hit the failing downstream waits the same fixed delay and retries at the same instant. The recovering downstream is hit with a synchronised herd of retries that pushes it back into failure. With jitter (Polly's UseJitter = true), each client's delay is multiplied by a random factor (typically 0.5×–1.5×) and the herd spreads out across the window. This is not a nice-to-have; it is the difference between a recovery taking 5 seconds and taking 5 hours.

.AddRetry(new RetryStrategyOptions<HttpResponseMessage>
{
    ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
        .Handle<HttpRequestException>()
        .HandleResult(r => (int)r.StatusCode is >= 500 or 408 or 429),
    MaxRetryAttempts = 3,
    Delay = TimeSpan.FromSeconds(1),
    BackoffType = DelayBackoffType.Exponential,
    UseJitter = true,
    OnRetry = args =>
    {
        log.LogWarning("Retry {Attempt} after {Outcome}", args.AttemptNumber, args.Outcome);
        return default;
    }
})

Timeout: overall vs per-attempt, propagating cancellation

A timeout cancels an operation that takes too long. Two flavours:

Per-attempt timeout — each individual retry attempt has its own deadline.
Overall timeout — the entire pipeline (all retries + waits + work) has a deadline.

You usually want both, with overall > per-attempt × max-attempts. Without an overall timeout, three retries with a 5-second per-attempt timeout plus exponential backoff can easily eat 30 seconds — by which point your caller has already given up.

new ResiliencePipelineBuilder<HttpResponseMessage>()
    .AddTimeout(TimeSpan.FromSeconds(30)) // overall
    .AddRetry(/* ... */)
    .AddTimeout(TimeSpan.FromSeconds(5))  // per-attempt
    .Build();

Polly timeouts are cooperative: they signal via CancellationToken. Your inner code must honour the token. The .NET BCL, EF Core, the Kusto SDK, the Azure SDKs and HttpClient all do this correctly. Hand-rolled Task.Run blocks that ignore the token will not be cancelled — they will run to completion in the background and waste resources.

The right wiring inside your service: take a CancellationToken parameter on every async method, pass it down to every awaited call, propagate to HttpClient.SendAsync(req, ct) and friends.

Circuit breaker: closed, open, half-open

The circuit breaker is the "stop knocking on a broken door" pattern. It has three states:

Closed — all calls pass through. Failures are counted in a sliding window. If FailureRatio is exceeded over MinimumThroughput calls in SamplingDuration, the breaker trips.
Open — all calls fail immediately with BrokenCircuitException. Saves both your service (no wasted threads) and the downstream (no wasted load while it recovers). After BreakDuration elapses, the breaker moves to Half-Open.
Half-Open — the next single call is a probe. If it succeeds the breaker returns to Closed; if it fails the breaker returns to Open for another BreakDuration.

.AddCircuitBreaker(new CircuitBreakerStrategyOptions<HttpResponseMessage>
{
    ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
        .Handle<HttpRequestException>()
        .HandleResult(r => (int)r.StatusCode >= 500),
    FailureRatio = 0.5,
    MinimumThroughput = 10,
    SamplingDuration = TimeSpan.FromSeconds(30),
    BreakDuration = TimeSpan.FromSeconds(30),
    OnOpened = args => { log.LogError("Breaker opened: {Reason}", args.Outcome); return default; },
    OnClosed = args => { log.LogInformation("Breaker closed"); return default; },
    OnHalfOpened = args => { log.LogInformation("Breaker half-open"); return default; }
})

When the breaker helps and when it hurts:

Helps when the downstream is overloaded; fail-fast removes the load and lets it recover.
Hurts when the downstream is having a brief blip; opening the breaker turns a 2-second blip into a 30-second outage. Tune MinimumThroughput and SamplingDuration so a single failure cannot trip the breaker.

MinimumThroughput is the safety: the breaker will not trip without at least N calls in the window, so a single early failure during low traffic does not cause cascading damage.

Good enough to ship

• Default: FailureRatio=0.5, MinimumThroughput=10, Sampling=30s, Break=30s.
• Combine with fallback so callers see graceful degradation.
• Log every state transition.

Expert tier

• Tune from production baselines, not from defaults.
• Per-endpoint or per-tenant breakers.
• Cross-instance breaker state via Redis for shared signal.

Bulkhead: isolating resource pools

The bulkhead pattern (named after ship compartments) caps the number of concurrent operations against a single resource so a slow downstream cannot drain the entire thread pool. Polly v8 calls it the concurrency limiter:

.AddConcurrencyLimiter(new ConcurrencyLimiterOptions
{
    PermitLimit = 50,  // max 50 concurrent
    QueueLimit = 10    // queue up to 10 more; reject the rest
})

The canonical scenario: your service has 200 thread-pool threads and 5 outbound HTTP clients to different downstreams. One downstream goes slow (10s response times). Without a bulkhead, that downstream's slow calls eat all 200 threads and the other 4 downstreams stop working too. With a 50-permit bulkhead per downstream, the slow one consumes at most 50 threads; the other 150 stay available.

Bulkhead and rate limiter are different. Bulkhead caps in-flight (concurrency); rate limiter caps issuance rate (calls per second). Use bulkhead to isolate, rate limiter to throttle.

Hedging: racing duplicate requests for tail-latency

Hedging issues a second (and optionally third) request to the same endpoint after a short delay, taking whichever returns first. It is the tail-latency mitigation pattern: when a small fraction of calls hang for unknown reasons, racing a parallel attempt usually catches the fast tail.

.AddHedging(new HedgingStrategyOptions<HttpResponseMessage>
{
    MaxHedgedAttempts = 2,
    Delay = TimeSpan.FromMilliseconds(300),
    ActionGenerator = args => args.Callback
})

Hedging is for idempotent reads only. Hedging a POST means you might hit the downstream twice with side effects; hedging a search query is fine. It is also for fast-when-healthy calls; hedging a 5-second call doubles the load without much p99 benefit.

The cost: extra downstream load. If 1% of calls take >300ms and hedging adds one extra request for each of those, you have grown downstream load by 1%. Worth it for an order-of-magnitude p99 improvement; not worth it for a 10% improvement.

Fallback: safe defaults, subtle-wrong is worse than obviously-wrong

When everything else in the pipeline fails, fallback substitutes a safe response so the caller sees a graceful degradation rather than an exception.

.AddFallback(new FallbackStrategyOptions<HttpResponseMessage>
{
    ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
        .Handle<Exception>()
        .HandleResult(r => r.StatusCode >= HttpStatusCode.InternalServerError),
    FallbackAction = async args => Outcome.FromResult(GetCachedResponse())
})

The cardinal rule: subtle-wrong is worse than obviously-wrong. If you cannot compute the correct value, return an empty list (and a header that says "served from fallback") rather than a stale value that looks correct. A user staring at an empty dashboard will refresh; a user staring at yesterday's data will believe today's data.

Fallback belongs at the outermost layer of the pipeline so it catches everything below. Putting fallback inside retry means the fallback runs on every failed attempt, defeating the purpose of retry.

Rate limiting and throttling

Server-side rate limiting protects your service from being overwhelmed; client-side rate limiting prevents you from being a noisy neighbour. Both belong in the pipeline:

// Client-side: cap your outbound rate
.AddRateLimiter(new RateLimiterStrategyOptions
{
    RateLimiter = args => new FixedWindowRateLimiter(
        new FixedWindowRateLimiterOptions { PermitLimit = 100, Window = TimeSpan.FromSeconds(1) })
            .AcquireAsync(1, args.Context.CancellationToken)
})

The .NET 7+ rate limiter primitives include FixedWindow, SlidingWindow, TokenBucket and Concurrency limiters. Pick:

TokenBucket for "average rate X, burst up to Y" — the most common shape.
FixedWindow for "exactly N calls per minute" — simple, can have boundary spikes.
SlidingWindow for "N calls in any 60-second window" — smoother than fixed but more memory.

When the downstream returns 429 with Retry-After, honour the header — that is the downstream telling you exactly how long to wait.

Polly v8: ResiliencePipelineBuilder, strategy ordering, options, AddResilienceHandler

Polly v8 redesigned the API around ResiliencePipelineBuilder. The mental model:

A pipeline is an ordered chain of strategies.
Each strategy is strongly-typed options (RetryStrategyOptions<T>, CircuitBreakerStrategyOptions<T>).
Strategies are added with .Add<Strategy>(options) and the order matters: outer strategies wrap inner ones.

Outer-to-inner ordering:

Fallback — outermost; catches everything below.
Overall timeout — sets the absolute deadline for the entire pipeline.
Bulkhead / rate limiter — admission control before doing real work.
Retry — the work that runs N times.
Circuit breaker — protects the actual call from being made when the downstream is broken.
Per-attempt timeout — innermost; caps each individual attempt.
The action — the HTTP/SQL/Kusto call.

var pipeline = new ResiliencePipelineBuilder<HttpResponseMessage>()
    .AddFallback(/* ... */)
    .AddTimeout(TimeSpan.FromSeconds(30)) // overall
    .AddConcurrencyLimiter(new() { PermitLimit = 50 })
    .AddRetry(/* ... */)
    .AddCircuitBreaker(/* ... */)
    .AddTimeout(TimeSpan.FromSeconds(5))  // per-attempt
    .Build();

For HttpClient consumers, the canonical wiring is AddResilienceHandler on HttpClientFactory:

builder.Services.AddHttpClient<IUpstreamClient, UpstreamClient>()
    .AddResilienceHandler("upstream", b =>
    {
        b.AddRetry(new HttpRetryStrategyOptions { MaxRetryAttempts = 3 });
        b.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.5, MinimumThroughput = 10,
            SamplingDuration = TimeSpan.FromSeconds(30),
            BreakDuration = TimeSpan.FromSeconds(30)
        });
        b.AddTimeout(TimeSpan.FromSeconds(10));
    });

Every outbound call through the registered HttpClient inherits the policy without per-call wrapping.

Exception semantics: TimeoutRejectedException, BrokenCircuitException, surfacing the cause

Polly throws specific exceptions when its strategies fail:

TimeoutRejectedException — a timeout strategy elapsed.
BrokenCircuitException — the breaker was Open when a call attempted to pass through.
RateLimiterRejectedException — the rate limiter denied admission.

These are Polly exceptions, not the underlying cause. When you catch them, log both the Polly exception type and the original cause (available via BrokenCircuitException.RetryAfter and the inner exception). Without surfacing the original cause, your incident response sees "BrokenCircuitException" 1000 times and never learns why the breaker opened.

try
{
    var result = await pipeline.ExecuteAsync(async ct => await client.GetAsync(url, ct));
}
catch (BrokenCircuitException ex)
{
    log.LogError(ex, "Breaker open for {Url}; original cause {InnerType}: {InnerMessage}",
        url, ex.InnerException?.GetType().Name, ex.InnerException?.Message);
    return Results.StatusCode(503);
}
catch (TimeoutRejectedException ex)
{
    log.LogWarning("Timeout on {Url} after {Elapsed}", url, ex.Message);
    return Results.StatusCode(504);
}

Worked examples

Example 1 — Full HttpClient pipeline

builder.Services.AddHttpClient<IPricingClient, PricingClient>(c =>
{
    c.BaseAddress = new Uri("https://pricing.internal/");
    c.DefaultRequestHeaders.UserAgent.ParseAdd("queries-api/1.0");
})
.AddResilienceHandler("pricing", b =>
{
    b.AddTimeout(TimeSpan.FromSeconds(30)); // overall budget
 
    b.AddRetry(new HttpRetryStrategyOptions
    {
        MaxRetryAttempts = 3,
        Delay = TimeSpan.FromSeconds(1),
        BackoffType = DelayBackoffType.Exponential,
        UseJitter = true,
        OnRetry = args =>
        {
            args.Context.GetLogger().LogWarning(
                "pricing retry {Attempt} after {Outcome}",
                args.AttemptNumber, args.Outcome.Result?.StatusCode);
            return default;
        }
    });
 
    b.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
    {
        FailureRatio = 0.5,
        MinimumThroughput = 10,
        SamplingDuration = TimeSpan.FromSeconds(30),
        BreakDuration = TimeSpan.FromSeconds(30)
    });
 
    b.AddTimeout(TimeSpan.FromSeconds(5)); // per-attempt
});

Overall + per-attempt timeouts, with retry + breaker between.
Retry observes 5xx/408/429 by default (HttpRetryStrategyOptions).
Logging callback emits a warning per retry so you see them in App Insights.

Example 2 — Idempotent POST with idempotency key

public sealed class IdempotencyMiddleware(RequestDelegate next, IDistributedCache cache)
{
    public async Task Invoke(HttpContext ctx)
    {
        if (HttpMethods.IsPost(ctx.Request.Method) &&
            ctx.Request.Headers.TryGetValue("Idempotency-Key", out var key))
        {
            var cacheKey = $"idemp:{ctx.Request.Path}:{key}";
            var cached = await cache.GetStringAsync(cacheKey);
            if (cached is not null)
            {
                ctx.Response.StatusCode = 200;
                ctx.Response.ContentType = "application/json";
                await ctx.Response.WriteAsync(cached);
                return;
            }
 
            // Capture the response body so we can cache it
            var original = ctx.Response.Body;
            using var capture = new MemoryStream();
            ctx.Response.Body = capture;
            await next(ctx);
            ctx.Response.Body = original;
 
            capture.Seek(0, SeekOrigin.Begin);
            var body = await new StreamReader(capture).ReadToEndAsync();
            capture.Seek(0, SeekOrigin.Begin);
            await capture.CopyToAsync(original);
 
            if (ctx.Response.StatusCode < 400)
                await cache.SetStringAsync(cacheKey, body,
                    new DistributedCacheEntryOptions { AbsoluteExpirationRelativeToNow = TimeSpan.FromHours(24) });
        }
        else
        {
            await next(ctx);
        }
    }
}

The middleware dedupes on (path, Idempotency-Key).
Successful responses are cached for 24 hours; duplicate retries within that window return the same response.
This is the prerequisite for safely retrying POSTs.

Example 3 — Hedging on a search endpoint

var pipeline = new ResiliencePipelineBuilder<SearchResult>()
    .AddHedging(new HedgingStrategyOptions<SearchResult>
    {
        MaxHedgedAttempts = 2,
        Delay = TimeSpan.FromMilliseconds(300),
        ActionGenerator = args => args.Callback
    })
    .AddTimeout(TimeSpan.FromSeconds(2))
    .Build();
 
app.MapGet("/api/search", async (string q, IIndexClient idx, CancellationToken ct) =>
{
    return await pipeline.ExecuteAsync(async token => await idx.SearchAsync(q, token), ct);
});

After 300ms, a second attempt is issued in parallel; whichever returns first wins.
The overall 2-second timeout bounds the whole thing.
For idempotent reads only.

Example 4 — Breaker state to App Insights

public static class BreakerMetrics
{
    public static readonly Meter Meter = new("polly-breaker", "1.0");
    public static readonly Counter<long> Opened = Meter.CreateCounter<long>("breaker.opened");
    public static readonly Counter<long> Closed = Meter.CreateCounter<long>("breaker.closed");
}
 
.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
    /* thresholds */
    OnOpened = args =>
    {
        BreakerMetrics.Opened.Add(1, new("downstream", "pricing"));
        return default;
    },
    OnClosed = args =>
    {
        BreakerMetrics.Closed.Add(1, new("downstream", "pricing"));
        return default;
    }
})

Emits a custom metric on every breaker transition.
Alert on breaker.opened > 0 for a downstream to learn about outages before users do.

Hands-on exercises

Flaky-server pipeline. Stand up a test server that returns 503 for 50% of calls. Wrap a client with timeout + retry + circuit breaker. Run 1000 requests through and verify each pattern triggers.
- You are done when you can observe in logs: retries succeeding, the breaker opening, the breaker closing again.
Jitter demonstration. Run 100 clients concurrently against a downstream that fails for 5 seconds then recovers. Compare median recovery time with and without UseJitter = true.
- You are done when the no-jitter case shows a "thundering herd" pattern and the jitter case does not.
Idempotent POST. Implement Example 2's middleware on a POST endpoint that creates a row. Retry the POST three times with the same Idempotency-Key. Confirm only one row exists.
- You are done when the row count after three retries is 1.
Fallback chain. Add an outer Fallback layer that returns a stale cached response when the breaker is Open. Force the breaker open. Confirm callers see the cached response with a "served-from-fallback" header.
- You are done when no request returns 5xx during the breaker-open period.
Hedging tuning. Add hedging at 300ms to a read endpoint. Inject a 1-second pause on 20% of calls. Measure p99 before and after.
- You are done when p99 measurably drops and downstream load measurably rises.
Wire with HttpClientFactory. Migrate any hand-rolled pipeline.Execute(...) calls to AddResilienceHandler and consume via typed HttpClient.
- You are done when no business code directly references the pipeline.

Self-check questions

Give the four-cell transient × retriable matrix and an example of each.
Why is jitter essential, and what is the failure mode without it?
Explain the three circuit-breaker states and the transition triggers.
Why is MinimumThroughput important on a circuit breaker?
What is the difference between an overall timeout and a per-attempt timeout?
When is hedging worth it and when is it harmful?
Why must fallback be the outermost layer of the pipeline?
Explain the difference between bulkhead and rate limiter.
Walk through the canonical outer-to-inner Polly pipeline ordering.
Why is BrokenCircuitException insufficient to diagnose an incident on its own?
What is an idempotency key, and why is it the prerequisite to safely retrying POSTs?
When does retrying make an outage worse instead of better?

High-signal resources

Official docs

Polly v8 documentation — the resilience pipeline model and every strategy.
Microsoft.Extensions.Http.Resilience — AddResilienceHandler integration.
.NET 7 rate limiting — the primitives behind Polly's rate limiter.

Books or courses

Release It! (2nd ed) — Michael T. Nygard. The book on stability patterns and anti-patterns.
Site Reliability Engineering (Google) — the chapters on cascading failures and overload.

Practitioner posts

AWS Builders' Library — timeouts, retries, and backoff with jitter — the canonical jitter writeup.
Marc Brooker — fail open vs fail closed — practitioner posts on resilience patterns.
.NET team — resilient HTTP apps — release notes and migration guides for Polly v8 + HTTP resilience.

Weekly milestones

Day 1. Read the Polly v8 docs + the AWS jitter post. Wire Example 1. Self-check questions 1–3.
Day 2. Jitter demonstration (exercise 2) + circuit breaker tuning. Self-check questions 4 + 12.
Day 3. Timeouts (overall + per-attempt). Self-check questions 5 + 9.
Day 4-5. Fallback + idempotency-key middleware (exercises 3 + 4). Self-check questions 7 + 10–11.
Day 6-7. Hedging (exercise 5) + AddResilienceHandler migration (exercise 6). Self-check questions 6 + 8.

How it shows up in the capstone

Every outbound HttpClient in the capstone API (Kusto, SQL via EF Core, Graph) is wrapped with AddResilienceHandler: 30-second overall timeout, 3 retries with jittered exponential backoff on 5xx/408/429, circuit breaker at 50% failure ratio over 10+ calls in 30 seconds, 5-second per-attempt timeout. Critical reads (the dashboard's main chart query) have an outer Fallback that returns the last cached result with a X-Served-From-Cache header.

Write endpoints require an Idempotency-Key header and are deduped through the middleware from Example 2. The search endpoint has hedging at 300ms because its p99 is the user-visible chrome of the entire app. Breaker state transitions emit custom metrics that fire alerts when a downstream goes south — almost always before the user-facing error rate does.

Previous chapter → Ch 15 — Observability Next chapter → Ch 17 — Caching