MS Stack Ch 17 — Caching
IMemoryCache, IDistributedCache, HybridCache (.NET 9), Redis, cache-aside, write-through, stampede protection, invalidation, TTL strategies. The discipline that makes 10× perf gains routine.
Chapter 17 of From Novice to Fluent on the Modern Microsoft Web Stack — a 22-chapter self-study plan.
Why this chapter
Caching is the single largest performance lever in a typical web application. A well-placed cache turns a 200ms database query into a 0.2ms in-process lookup, sheds 90% of read load from your origin, and lets one instance serve work that previously required ten. The same cache, mis-managed, produces stale dashboards for hours after a write, IDOR bugs where users see each other's data, cache stampedes that take down recovering downstreams, and "we cleared the cache and now everything is broken" production stories. Caching is power, and like all power it is regrettable when handed to those who do not respect it.
Shipping-level caching means: you reach for IMemoryCache for hot per-instance reference data, IDistributedCache (Redis) when the cache must be shared, and HybridCache when you want both tiers plus stampede protection in one call. Expert-level caching means: you can size cache keys to avoid cardinality blowups, you have run a chaos test to prove your app survives a Redis outage, your invalidation strategy is documented and reviewable, your TTLs map to business-defensible staleness budgets, and you have alerts on hit-rate degradation that fire before anyone notices a regression.
You finish this chapter when you can pick the right cache layer for an arbitrary workload, defend the choice in code review, and explain on demand why caching writes is a category error.
Concepts and depth
Why cache: latency reduction and load shedding
Caches do two distinct jobs that tend to get conflated:
- Latency reduction — the cached path is faster than the origin path. A 0.1ms in-process hit beats a 5ms Redis call beats a 50ms SQL call beats a 500ms Kusto call beats a 2-second external API.
- Load shedding — the origin handles fewer calls because the cache absorbs them. The cache is a shock absorber for the downstream. A 90% hit rate turns a 1000 RPS service into a 100 RPS load on the database.
Both matter, but they pull on different design knobs. Latency reduction rewards keeping the cache close (in-process); load shedding rewards keeping the cache shared (Redis). The right answer is usually both tiers stacked.
Caching is also, deliberately, the lazy answer to scalability — easier to add than to redesign the data model, and that is fine when the data is naturally read-heavy. It becomes wrong when used as a substitute for a real performance fix in code or schema; cache-hiding a slow query is borrowing time at compound interest.
Cache primitives: IMemoryCache, IDistributedCache, HybridCache
.NET ships three caching abstractions that compose into a complete strategy:
IMemoryCache— in-process. Microseconds per access; per-instance; lost on restart. No serialisation. Best for hot, small, per-instance reference data.IDistributedCache— shared. Backed by Redis (typically), SQL Server, or Azure Cosmos. Sub-millisecond on a LAN; pays serialisation cost. Shared across instances. Best for shared expensive query results and pseudo-session state.HybridCache(.NET 9+) — combinesIMemoryCache(L1) andIDistributedCache(L2) behind a single API, with stampede protection and tag-based invalidation built in. The new default for any non-trivial caching.
// IMemoryCache
builder.Services.AddMemoryCache();
// IDistributedCache via Redis
builder.Services.AddStackExchangeRedisCache(opts =>
{
opts.Configuration = builder.Configuration.GetConnectionString("Redis");
opts.InstanceName = "queries:";
});
// HybridCache — wraps both
builder.Services.AddHybridCache(opts =>
{
opts.DefaultEntryOptions = new HybridCacheEntryOptions
{
Expiration = TimeSpan.FromMinutes(10), // L2 TTL
LocalCacheExpiration = TimeSpan.FromMinutes(2) // L1 TTL
};
});
The simple decision tree: if your data is small + hot + per-instance-OK, use IMemoryCache. If it must be shared, use IDistributedCache directly or — if on .NET 9+ — HybridCache. The hand-rolled L1+L2 wrappers from older codebases should migrate to HybridCache whenever feasible.
- •
HybridCachefor shared expensive query results. - •
IMemoryCachefor hot per-instance reference data. - • Redis for L2 in production; SQL only as a last resort.
- • Custom
IDistributedCachefor niche backends. - • Tag invalidation broadcast via Redis pub/sub.
- • Multi-region Redis with active-active replication.
Cache-key design: encode every variable, stability, cardinality
A cache key uniquely identifies a (request, context) tuple whose answer is stable for the TTL. The discipline is straightforward in principle and a constant source of bugs in practice.
Encode every variable that affects the answer. A dashboard query that varies by tenant, user role, date range and feature flag must have all four in the key:
var key = $"dashboard:{tenantId}:{userId}:{from:yyyyMMddTHHmm}:{to:yyyyMMddTHHmm}:flagSet:{flags.Hash()}";
Missing any of these turns the cache into a correctness bug. The canonical IDOR-meets-cache disaster is var key = "dashboard" — every user sees the first user's dashboard, forever.
Stability across deploys. A cache key that includes serialiser version, framework version or assembly hash means every deploy invalidates the entire cache. That is sometimes desirable (after a schema change) and usually not. Prefer explicit version prefixes (v17:dashboard:...) that you bump intentionally.
Cardinality control. The number of distinct keys grows with the cardinality of the dimensions you encode. A (tenantId, userId) key with 100k tenants × 10k users per tenant means a billion distinct keys; your Redis bill goes vertical. Bucket high-cardinality dimensions (e.g. round timestamps to 5-minute buckets) and avoid keying on things that change every request (request id, current minute).
// Bad: includes minute precision; cardinality blows up
var key = $"q:{userId}:{DateTime.UtcNow:O}";
// Good: bucketed; keys are stable across the bucket window
var bucket = new DateTime(DateTime.UtcNow.Year, DateTime.UtcNow.Month, DateTime.UtcNow.Day,
DateTime.UtcNow.Hour, DateTime.UtcNow.Minute / 5 * 5, 0, DateTimeKind.Utc);
var key = $"q:{userId}:{bucket:yyyyMMddTHHmm}";
TTL strategies: absolute vs sliding, why sliding can serve stale forever
IMemoryCache and HybridCache expose two expiration types:
- Absolute expiration — a hard ceiling: the entry expires N minutes after it is set, regardless of access.
- Sliding expiration — the timer resets on each access: an entry that keeps being read keeps living.
Sliding alone is a trap. If a popular entry is read constantly, it never expires, even if the underlying data changed an hour ago. The user-visible behaviour: a stale entry that never refreshes for some users. Always pair sliding with an absolute ceiling so even hot entries are forced to refresh:
.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15)
.SlidingExpiration = TimeSpan.FromMinutes(5)
This means: cold entries die after 5 minutes; warm entries survive at most 15 minutes regardless of access. Sliding-only is a bug; absolute-only is fine but evicts cold-then-warm-again entries unnecessarily.
A rough TTL guide that you should re-derive from your own business contracts:
- Static reference data (countries, currencies): 24 hours.
- User profile / org settings: 5–15 minutes.
- Search results and dashboards: 1–5 minutes.
- Real-time data: do not cache, or 5–30 seconds with stale-while-revalidate.
- Auth tokens: match token TTL minus 5 minutes.
Pick a number for a reason; the number is a staleness contract with users.
Eviction: size limits, memory pressure, SetSize
IMemoryCache is bounded by what you give it. If you do not set a SizeLimit, the cache will grow until the process runs out of memory or the runtime trims it under pressure. The discipline:
builder.Services.AddMemoryCache(opts =>
{
opts.SizeLimit = 100_000; // arbitrary "units"; you define the unit
});
// Every entry must declare its Size
cache.Set(key, value, new MemoryCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5),
Size = 1
});
SizeLimit is in units you choose. Common choices: 1 unit per entry (simple count), bytes (measure each entry's serialised size), or a custom weight (e.g. bytes × access frequency). Whatever you pick, every entry must set Size — entries without Size count as 0 and never count toward the cap.
When the cap is hit, IMemoryCache evicts on a least-recently-used basis. There is no fairness guarantee across tenants; a noisy tenant can monopolise the cache. For multi-tenant fairness, partition by tenant (separate caches per tenant) or implement a custom eviction policy.
IDistributedCache / Redis eviction is configured at the Redis instance level (maxmemory-policy set to allkeys-lru or volatile-lru for most use cases). Choose allkeys-lru for general caching and volatile-lru if you also use Redis for non-TTL'd data (which you usually should not).
Cache invalidation strategies: TTL, explicit, version prefix bumping
The four invalidation approaches you should know:
- TTL only — entries expire on a clock. Simple; eventual freshness; acceptable when staleness windows match business need. Use as the baseline.
- Explicit on write —
cache.Remove(key)(orcache.RemoveByTagAsync(tag)forHybridCache) every time a write touches the underlying data. Tighter consistency; couples write paths to the cache. Coordinate across instances with Redis pub/sub or rely on L2's single source of truth. - Tag-based invalidation —
HybridCachelets you set entries with tags (new(){ Tags = ["tenant:42", "dashboard:5"] }) and invalidate all entries with a tag (cache.RemoveByTagAsync("tenant:42")). The canonical pattern for "one write invalidates many derived caches". - Version-prefix bumping — every key starts with a version (
v17:dashboard:...). On a write, the writer bumps a stored version (v17 → v18); subsequent reads miss and repopulate. Old keys age out via TTL. Atomic; no eviction required; works across instances without pub/sub.
// Tag-based with HybridCache
await cache.GetOrCreateAsync($"user:{id}",
factory: async ct => await repo.GetAsync(id, ct),
options: new() { Tags = new[] { $"user:{id}", "users:all" } });
await cache.RemoveByTagAsync($"user:{id}"); // on write
// Version-prefix bumping (own bookkeeping)
var version = await cache.GetOrCreateAsync("v:dashboard", _ => Task.FromResult("0"));
var key = $"v{version}:dashboard:{tenantId}";
// ... read or populate ...
// On any write:
await cache.SetAsync("v:dashboard", (int.Parse(version) + 1).ToString());
- • TTL by default.
- • Explicit
RemoveByTagon write paths. - • Document each cache's freshness contract.
- • Version-prefix bumping for high-throughput strict consistency.
- • Pub/sub fan-out for L1 invalidation across instances.
- • Per-tenant invalidation channels.
Stampede protection: single-flight, probabilistic early refresh
A cache stampede happens when an entry expires under load: many concurrent callers all miss, all call the origin, all populate the cache. The origin sees a burst of duplicate work; on a marginal downstream this can be the difference between "recovers in 1 second" and "stays down".
The fixes:
- Single-flight (LazyCache,
HybridCache) — the cache library tracks in-flight factories per key. When 100 callers miss simultaneously, the first triggers the factory; the rest await its result.HybridCachedoes this natively;LazyCacheprovides it on top ofIMemoryCache. SemaphoreSlimper key — the manual equivalent: a per-keySemaphoreSlimguards the factory. Easy to get slightly wrong (leaking semaphores, blocking the wrong threads). Prefer the library.- Probabilistic early refresh — refresh the entry before it expires with a probability proportional to how close it is to expiry. Each individual caller has a small chance of being the one to refresh; the stampede is spread over the lead-up to expiry rather than concentrated at the moment.
// HybridCache — single-flight built in
public Task<User?> GetUser(Guid id, CancellationToken ct) =>
cache.GetOrCreateAsync($"user:{id}",
factory: async tok => await repo.GetAsync(id, tok),
cancellationToken: ct);
// Manual SemaphoreSlim-per-key (when you cannot use HybridCache)
private static readonly ConcurrentDictionary<string, SemaphoreSlim> _gates = new();
public async Task<T> GetOrCreateSingleFlightAsync<T>(string key, Func<Task<T>> factory)
{
if (_cache.TryGetValue<T>(key, out var hit)) return hit!;
var gate = _gates.GetOrAdd(key, _ => new SemaphoreSlim(1, 1));
await gate.WaitAsync();
try
{
if (_cache.TryGetValue<T>(key, out hit)) return hit!;
var value = await factory();
_cache.Set(key, value, TimeSpan.FromMinutes(5));
return value;
}
finally { gate.Release(); }
}
Cache warmup: BackgroundService, PeriodicTimer, canonical request sets
For caches that take a noticeable time to populate (a 2-second Kusto query that you want to keep hot), pre-populating on a schedule is cheaper than absorbing every cold miss during user traffic. The pattern:
public class DashboardWarmup(HybridCache cache, IDashboardService svc, ILogger<DashboardWarmup> log)
: BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
using var timer = new PeriodicTimer(TimeSpan.FromMinutes(1));
do
{
var requests = await svc.GetCanonicalRequestsAsync(stoppingToken);
await Parallel.ForEachAsync(requests,
new ParallelOptions { MaxDegreeOfParallelism = 4, CancellationToken = stoppingToken },
async (req, ct) =>
{
try
{
await cache.GetOrCreateAsync(req.Key,
factory: tok => svc.ComputeAsync(req, tok),
cancellationToken: ct);
}
catch (Exception ex)
{
log.LogWarning(ex, "Warmup failed for {Key}", req.Key);
}
});
} while (await timer.WaitForNextTickAsync(stoppingToken));
}
}
- A canonical request set is the small list of queries you know users will hit (the top dashboards, the default views, the popular reports).
PeriodicTimeris the right primitive for "wake up every N seconds" — it does not drift, it respects cancellation, it is cheap.MaxDegreeOfParallelismcontrols how aggressively warmup hits the origin; tune so warmup never starves real traffic.
Warmup also runs at startup so the first user request hits a warm cache. Gate it behind the readiness probe so the slot does not take traffic until warmup completes.
Where caching lives in the stack
The same data can be cached at any of five layers. Combining them safely is the architecture choice:
- Browser —
Cache-Control,ETag,If-None-Match. The cheapest layer; zero server work for hits. - CDN / edge (Front Door, Cloudflare) — public, idempotent GETs cached at the geographic POP. Microseconds-to-user latency.
- App-level distributed (Redis) — shared across app instances; sub-millisecond LAN; the cache you reach for first.
- App-level in-process (
IMemoryCache) — microseconds per access; per-instance. - Database query cache — DB-side result caching (SQL Server query store, Cosmos cached responses). Last-mile; you do not control it directly.
A combined stack might look like:
Browser → CDN (5 min Cache-Control) → API → HybridCache L1 (2 min) → HybridCache L2 (10 min) → Kusto
The TTLs cascade: the outer layer's TTL should be ≤ the inner layer's TTL, or a write that invalidates the inner layer leaves stale data at the outer layer. Common bug: CDN caches for 1 hour, app cache invalidates on write; user gets a CDN-served stale response for the rest of the hour.
Anti-patterns
A short list of caching mistakes that bite forever:
- Caching mutable references — caching a
List<T>that one caller mutates while another reads it. Cache immutable snapshots (records, frozen collections) only. - Caching failures — caching a 500 response or an exception. The first failure becomes a permanent failure for the TTL. Cache only successful responses.
- Unscoped keys leaking data across users/tenants — the IDOR-cache bug. Tenant + user always in the key for any user-specific data.
- Caching writes — caches are for reads. Writes go to the origin, then explicitly invalidate (or update) the cache. "Write-through" means write to cache and to origin atomically, not "write only to cache".
- No TTL — eternally cached entries become stuck-state bugs that survive across deploys and feature flags.
- Cache without metrics — you have no idea what your hit rate is; you have no alert when it drops. Emit
cache.hits/cache.missesand alert on hit-rate degradation.
Worked examples
Example 1 — HybridCache for a dashboard query
public class DashboardService(HybridCache cache, IKustoClient kusto)
{
public Task<DashboardPayload> GetAsync(Guid tenantId, Guid dashboardId, CancellationToken ct)
{
var key = $"v3:dashboard:{tenantId}:{dashboardId}";
return cache.GetOrCreateAsync(
key,
factory: ct => kusto.RunDashboardAsync(tenantId, dashboardId, ct),
options: new HybridCacheEntryOptions
{
Expiration = TimeSpan.FromMinutes(5),
LocalCacheExpiration = TimeSpan.FromMinutes(1),
Tags = new[] { $"tenant:{tenantId}", $"dashboard:{dashboardId}" }
},
cancellationToken: ct);
}
public Task InvalidateDashboardAsync(Guid dashboardId) =>
cache.RemoveByTagAsync($"dashboard:{dashboardId}");
}
- Versioned key prefix (
v3:) — bump intentionally on schema change. - Tags
tenant:{tenantId}anddashboard:{dashboardId}allow targeted invalidation. LocalCacheExpirationkeeps the L1 honest (1 minute); L2 holds for 5 minutes.
Example 2 — Per-key single-flight without HybridCache
public class CacheService(IMemoryCache mc, ILogger<CacheService> log)
{
private static readonly ConcurrentDictionary<string, SemaphoreSlim> _gates = new();
public async Task<T> GetOrCreateAsync<T>(string key, TimeSpan ttl, Func<Task<T>> factory)
{
if (mc.TryGetValue<T>(key, out var hit)) return hit!;
var gate = _gates.GetOrAdd(key, _ => new SemaphoreSlim(1, 1));
await gate.WaitAsync();
try
{
if (mc.TryGetValue<T>(key, out hit)) return hit!;
var value = await factory();
mc.Set(key, value, new MemoryCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = ttl,
Size = 1
});
return value;
}
finally
{
gate.Release();
// Optional: prune cold semaphores periodically to avoid dictionary growth.
}
}
}
- One semaphore per key; the first caller computes, the rest wait.
Size = 1is mandatory if theIMemoryCachehas aSizeLimit.- The semaphore dictionary grows; in production, prune cold entries on a timer.
Example 3 — Cache warmup BackgroundService
builder.Services.AddHostedService<DashboardWarmup>();
public class DashboardWarmup(
HybridCache cache,
IDashboardCatalog catalog,
IKustoClient kusto,
ILogger<DashboardWarmup> log) : BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken ct)
{
using var timer = new PeriodicTimer(TimeSpan.FromMinutes(1));
do
{
var canonical = await catalog.GetHotDashboardsAsync(top: 50, ct);
await Parallel.ForEachAsync(canonical,
new ParallelOptions { MaxDegreeOfParallelism = 4, CancellationToken = ct },
async (d, tok) =>
{
try
{
await cache.GetOrCreateAsync($"v3:dashboard:{d.TenantId}:{d.Id}",
factory: t => kusto.RunDashboardAsync(d.TenantId, d.Id, t),
options: new() { Expiration = TimeSpan.FromMinutes(5) },
cancellationToken: tok);
}
catch (Exception ex) when (ex is not OperationCanceledException)
{
log.LogWarning(ex, "Warmup failed for {Key}", d.Id);
}
});
} while (await timer.WaitForNextTickAsync(ct));
}
}
- The catalog returns the top-N "hot" dashboards — the small canonical set worth pre-computing.
- Parallel warmup with bounded concurrency keeps origin load predictable.
- Swallows non-cancellation exceptions so one bad warm-up does not kill the loop.
Example 4 — CDN-friendly responses with Cache-Control and ETag
app.MapGet("/api/countries", async (ICountryRepo repo, HttpContext ctx) =>
{
var countries = await repo.GetAllAsync();
var etag = $"\"{HashCountries(countries):x}\"";
if (ctx.Request.Headers.IfNoneMatch.ToString() == etag)
return Results.StatusCode(304);
ctx.Response.Headers.ETag = etag;
ctx.Response.Headers.CacheControl = "public, max-age=300, stale-while-revalidate=600";
return Results.Ok(countries);
});
Cache-Control: publiclets CDN and browser cache.max-age=300— fresh for 5 minutes.stale-while-revalidate=600— for the next 10 minutes after expiry, the CDN can serve the stale value while it re-fetches in the background.ETag+If-None-Match→304 Not Modifiedsaves bandwidth even on cache miss.
Hands-on exercises
-
In-process speedup. Wrap a 100ms-slow endpoint in
IMemoryCachewith a 5-minute TTL. Confirm 99% of requests are sub-millisecond.- You are done when the latency histogram shows the bimodal hit/miss distribution.
-
Add a distributed tier. Run Redis locally (
docker run -p 6379:6379 redis), addAddStackExchangeRedisCache, and confirm the cache survives a process restart.- You are done when restarting the app preserves cached values.
-
Migrate to
HybridCache. Replace the manual L1+L2 implementation withHybridCache. Confirm L1 hits are sub-millisecond and L2 hits add a small serialiser cost.- You are done when both hit paths are observable in custom metrics.
-
Stampede demonstration. Force 100 concurrent callers to miss an empty cache. Count factory invocations with and without
HybridCache. Confirm only one invocation withHybridCache.- You are done when the metric chart proves single-flight.
-
Tag-based invalidation. Cache user data with
Tags = ["user:{id}"]. On user update, callRemoveByTagAsync($"user:{id}"). Confirm all derived caches drop.- You are done when post-update reads repopulate.
-
Warmup at startup. Add the
DashboardWarmupBackgroundServicefrom Example 3. Gate the readiness probe on warmup completion. Confirm a deploy never serves a cold request to a user.- You are done when the first request after a fresh deploy hits a warm cache.
Self-check questions
- What is the difference between absolute and sliding expiration, and why is sliding-only a bug?
- What is a cache stampede, and what are three ways to prevent one?
- Why must every cache entry have a
Sizeset ifMemoryCache.SizeLimitis configured? - Compare TTL-only invalidation, explicit invalidation, tag-based invalidation, and version-prefix bumping. Give one use case for each.
- Why is "caching failures" an outage amplifier?
- Explain the IDOR-cache bug and the one-line fix.
- What's the difference between cache-aside and write-through?
- When does in-process caching beat Redis, and vice versa?
- Why do CDN and app-level TTLs need to cascade?
- What is
stale-while-revalidategood for? - What primitives does
HybridCachegive you that hand-rolled L1+L2 does not? - How would you alert on a cache hit-rate regression?
High-signal resources
Official docs
- HybridCache in ASP.NET Core — the new default.
- In-memory caching (
IMemoryCache). - Distributed caching (
IDistributedCache). - Response caching middleware.
- StackExchange.Redis docs.
Books or courses
- Designing Data-Intensive Applications — Martin Kleppmann. The replication and caching chapters.
- Database Internals — Alex Petrov. Useful for reasoning about why caches help and hurt.
Practitioner posts
- Microsoft eShop sample — production-grade caching patterns in a reference app.
- Marc Brooker — caches, stampedes and load shedding — practitioner deep-dives.
- High-scale load balancing — Google SRE book — caching in the broader load-management context.
Weekly milestones
- Day 1. Read the HybridCache docs. Do exercise 1. Self-check questions 1–3.
- Day 2. Add Redis (exercise 2). Self-check questions 8 + 9.
- Day 3. Migrate to HybridCache (exercise 3) + stampede demo (exercise 4). Self-check questions 2 + 11.
- Day 4-5. Tag-based invalidation (exercise 5). Self-check questions 4 + 6.
- Day 6-7. Warmup + CDN headers (exercise 6 + Example 4). Self-check questions 5 + 7 + 10 + 12.
How it shows up in the capstone
The capstone uses HybridCache with L1 in-process and L2 on Azure Cache for Redis. Hot reference data (chart catalogue, user profile) caches for 15 minutes with 2-minute L1. Dashboard payloads cache for 5 minutes per (tenant, dashboard) and use tag-based invalidation on dashboard updates. A DashboardWarmup BackgroundService pre-populates the top 50 hot dashboards every minute and gates the readiness probe at startup.
Public GET endpoints emit Cache-Control: public, max-age=300, stale-while-revalidate=600 so Front Door can serve hot reads from the edge. Every cache key includes the tenant id and, where applicable, the user id, to avoid the IDOR-cache bug. Custom metrics cache.hits, cache.misses, cache.factory_calls feed a hit-rate alert that fires when hit rate drops below 80% for 15 minutes — the canonical early-warning that a deploy broke a key or busted a TTL.
Previous chapter → Ch 16 — Resilience patterns Next chapter → Ch 18 — CI/CD with Azure DevOps