careeradvanced 32m read

MS Stack Ch 22 — Pro skills (the bits no one writes down)

Code review craft, on-call composure, incident writing, technical proposals, mentoring, learning velocity, knowing when to rewrite. The chapter that turns a strong engineer into a senior one.

Chapter 22 of From Novice to Fluent on the Modern Microsoft Web Stack — a 22-chapter self-study plan.

Why this chapter

The previous twenty-one chapters were mechanics — the runtime, the framework, the pipeline, the cloud. This one is the part that actually decides whether a strong engineer becomes a senior one. None of it is on a certification path; all of it shows up in the promotion packet, the code review comments other engineers ask for, and the postmortem document the team reads twice.

The skill set is portable across stacks and companies. Reading a large codebase you did not write. Debugging with intent rather than print statements. Writing a pull request that someone enjoys reviewing. Documenting a decision so the team in 2028 understands why a 2026 you chose what you chose. Each is a craft; each takes deliberate practice; none has a tutorial that finishes the job for you.

The shipping-grade version is "I get my code in, I respond to reviews, I unblock myself". The senior version is "I land changes that compose with what's there, I shape the team's reviews to be productive, I leave the codebase better-understood for the next person". The gap is taste, calibration, and a small set of habits.

You finish this chapter when these habits are on autopilot: opening a new repo and finding the composition root first, framing a bug as a hypothesis before writing the fix, breaking a 600-line change into PRs reviewers can hold in their heads, writing an ADR before merging the choice it captures.

Reading code

Composition roots, follow the wiring outward, build the map fast.

Debugging

Hypothesis-driven, narrow the suspect set, the failure mode tells you where to look.

Stack traces

Async state machines, missing frames, what to keep and what to skip.

Performance

Measure first, big-O on hot paths, allocation awareness, the cost of premature opt.

PR craft

Small changes, clear intent, the review you would want to receive.

Decisions on paper

ADRs, design docs, the conversation the doc has with future-you.

Concepts and depth

Reading large codebases

You will spend more of your career reading code than writing it. The skill is not "read every file"; it is "find the right file fast and build a mental map from it outward".

Every codebase has a composition root — the place where the application is wired together. In ASP.NET Core, it is Program.cs: services are registered, middleware is added, endpoints are mapped, the host runs. In a React app, it is main.tsx or App.tsx: providers wrap, the router is configured, the top-level component renders. In a CLI tool, it is the Main method or the program.argv[2] dispatch. Start there. Trace the wiring outward. Within thirty minutes you should be able to name the major modules, the data flow direction, and the external services the app talks to.

The next move is read a single endpoint or screen end to end. Pick the simplest one. Follow the request from controller to service to repository, or from component to hook to API call. You learn the codebase's conventions — naming, layering, error handling, logging — by repetition; one end-to-end trace teaches more than browsing twenty files.

Read the tests. Tests are documentation that compiles. The integration test for an endpoint shows you the expected request and response shapes, the auth requirements, the database state. The unit test for a service shows you the contracts of its collaborators. When a part of the codebase confuses you, find its test and read it; the test is the simplest form of the code's intent.

Ignore the noise. A codebase has accumulated cruft — abandoned experiments, dead branches, the old config layer that was supposed to be deleted three years ago. New readers waste time treating cruft as if it were load-bearing. Ask the team or check git log -- file.cs to find out whether a file has been touched in the last year. If not, it is probably not critical to your task.

Good enough to ship

• Find the composition root within minutes of opening a repo
• Trace one endpoint or screen end to end
• Use tests as the canonical contract documentation

Expert tier

• Write the team's "first day" onboarding map for the repo
• Distinguish load-bearing code from cruft using git history
• Internalise the conventions enough to write code that looks native

Debugging with intent

The default debugging mode is "add print statements until the bug reveals itself". This works for small bugs in code you wrote yesterday. It does not work for production incidents, race conditions, or any bug whose root cause is two layers below the symptom.

The intentional mode is hypothesis-driven. State, in one sentence, what you think is happening: "I believe the cache is returning a stale value because the invalidation event arrives after the next read". Now design an experiment that proves or disproves the hypothesis: add a log at the invalidation point, replay the failing scenario, observe whether the invalidation precedes the read. The result either confirms (proceed to fix) or refutes (form a new hypothesis with new evidence).

The technique that scales is narrow the suspect set. You have a bug; the change that caused it must live somewhere in the codebase or its inputs. Eliminate large regions: does it reproduce with stub inputs? (No → the bug is in input parsing.) Does it reproduce with the DB stubbed? (No → the bug is in DB interaction.) Does it reproduce on the previous build? (No → the bug landed in the recent diff.) Each answer halves the search space. Within an hour you have the suspect down to a few hundred lines.

Read the actual error. Stack traces, exception messages, log lines — they say more than developers usually read. A NullReferenceException on line 47 of OrderService.cs is not "something is null"; it is "the specific reference at line 47 was null when this method ran with these inputs". Trace back: which variable, where was it assigned, what could make it null at that point.

Reproduce reliably before fixing. A bug you cannot reproduce is a bug you cannot prove you fixed. Spend the time on a reliable reproducer — a unit test, a script, a saved request — before changing code. The fix verification is then trivial: run the reproducer, observe pass.

The hardest debugging muscle is stopping when you have an answer that fits. Confirmation bias makes you stop too early. The discipline is to ask "what would I observe if my hypothesis were wrong?" and look for that contrary evidence before declaring the bug solved.

Good enough to ship

• State the hypothesis before changing code
• Reproduce reliably before fixing
• Read the full stack trace, not just the top line

Expert tier

• Binary-search the suspect set systematically
• Use git bisect when the bug is "started failing at some commit"
• Routinely seek contrary evidence before declaring the bug solved

Reading stack traces (and async state machines)

A synchronous stack trace is a stack: each frame is a method that called the next. Read top to bottom; the top is where the exception was thrown, the bottom is the entry point. Skip framework frames (Microsoft.AspNetCore.*, System.*) unless they are the bug. Your code's frames are the signal; skim them in order to reconstruct the call path.

Async stack traces are harder because the runtime translates async/await into a state-machine class. A frame may read MoveNext() rather than the method you wrote; the calling frame may be the AwaitTaskContinuation rather than the awaiter. .NET 6+ improved this significantly — most modern traces show your async method names — but you will still encounter old traces from older runtimes.

The pattern to recognise: a method Foo containing an await produces a state machine Foo+<>c__DisplayClass3_0 whose MoveNext runs the method's continuations. When you see MoveNext in a trace, look at the enclosing class name to find the actual method.

System.NullReferenceException: Object reference not set to an instance of an object.
   at MyApp.Services.OrderService.<GetAsync>d__7.MoveNext() in OrderService.cs:line 42
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AwaitUnsafeOnCompleted...
   at MyApp.Controllers.OrdersController.<GetById>d__5.MoveNext() in OrdersController.cs:line 23

Translation: OrderService.GetAsync (line 42) threw NRE; it was awaited by OrdersController.GetById (line 23). The intermediate AwaitUnsafeOnCompleted is plumbing — skip it.

Missing frames happen when the runtime optimises away tail calls or when exceptions cross task boundaries. Look for inner exceptions (ex.InnerException) — AggregateException and TaskCanceledException often wrap the real one. ex.StackTrace may be one stack; ex.InnerException.StackTrace the other.

When all else fails, search the exception message verbatim. Most third-party libraries log exceptions whose messages are unique strings; a search will find prior reports, GitHub issues, and Stack Overflow answers. Read three results before forming a hypothesis.

Good enough to ship

• Read the full trace top to bottom; skip framework frames
• Find the inner exception when there is one
• Translate MoveNext back to the async method it belongs to

Expert tier

• Decode AggregateException trees from concurrent code
• Use dotnet-dump + dotnet-stack for hung processes
• Read core dumps with WinDbg / lldb for kernel-level issues

Performance basics: measure first, big-O on hot paths

The most common performance mistake is optimising before measuring. The second most common is treating every code path as hot.

Measure first. Run a profiler (dotnet-trace, BenchmarkDotNet, browser DevTools for frontend) before changing anything. The first measurement tells you which functions actually dominate runtime. The intuitions of "this should be slow" and "this should be fast" are wrong most of the time.

Once you know the hot paths, big-O matters. An O(n²) algorithm on a 10-item list is fine; on a 100,000-item list it is a tarpit. Hash-based lookups (HashSet, Dictionary) replace nested loops; ordered iteration with merge-style algorithms replaces repeated containment checks. The Computer Science you learned in 2014 is the same Computer Science that runs production traffic in 2026.

Allocation awareness matters in hot paths in .NET because garbage collection is not free. Every new is a GC root; large object heap allocations are expensive; LINQ chains in tight loops can allocate per iteration. The fix is not "never allocate"; it is "know where your hot path allocates and reduce it where it matters". Span<T> and Memory<T> let you slice without allocating; pooled arrays (ArrayPool<T>) reuse buffers. These are heavy guns; use them when profiling shows allocation pressure.

The cost of premature optimisation is paid in readability. Code optimised for a path that is not hot is code the next reader has to decode for no gain. Default to clarity; optimise the proven hot paths only.

The frontend has its own version: measure with Lighthouse and the browser's Performance tab. Bundle size, time-to-interactive, layout shifts. The intuitions (useMemo everywhere) are usually wrong; the measurement (one component re-renders 200 times per second) is the signal.

Good enough to ship

• Profile before optimising
• Replace nested loops with hash-based lookups when n grows
• Avoid LINQ in hot loops on the .NET side

Expert tier

• Use BenchmarkDotNet for micro-benchmarks; results inform decisions
• Span<T> / pooling on the hottest paths
• Frontend: monitor Core Web Vitals in production

Writing PR-sized changes

A 1,000-line PR has a near-zero chance of getting a thorough review. The reviewer skims, hopes for the best, and approves; bugs that a smaller diff would have caught ship. The discipline is to break work into PRs the reviewer can hold in their head.

The rule of thumb: 300 lines of meaningful diff or less per PR (auto-generated files don't count). If the work is bigger, split. The splits are usually structural — refactor first, then add the feature; introduce the new interface first, then the implementations; move first, then modify. Each split is a separate, mergeable PR with its own intent.

The PR description is the second discipline. Write why before what. The diff shows what changed; the description must explain why this change, why now, what the tradeoffs were, and how the reviewer should approach the review. A description that says "Adds support for X" is wasted bytes. One that says "Customers in EU need feature X by date Y; we considered approaches A, B, C; chose B because of latency; this PR implements B, follow-up PRs add monitoring and migrate existing data" gives the reviewer the context to evaluate the design.

Self-review before requesting. Open the PR, read your own diff, leave inline comments on the parts you anticipate questions about. Reviewers value this enormously — it signals you have thought about your work and that their time will be used efficiently.

Reviewing others. The reciprocal skill. Read the PR description first; understand the why. Run the code mentally; identify the parts that interact with the rest of the system. Comment on what matters (correctness, maintainability, design) and not on what does not (formatting handled by linter, personal style). Approve when you would be comfortable having the code in production; do not rubber-stamp.

Good enough to ship

• PRs under 300 lines of meaningful diff
• Description explains why, not just what
• Self-review before requesting

Expert tier

• Coach the team toward PR-sized changes
• Reviews catch design issues, not just bugs
• Approve with substantive feedback, not rubber-stamping

Documenting decisions: ADRs and design docs

Code shows what; comments show why this line; decision documents show why this approach. The form that has won is the Architecture Decision Record (ADR) — a short markdown file capturing one decision, its context, the alternatives considered, and the consequences.

# ADR-007: Use Cosmos DB for the analytics aggregates store

## Status
Accepted

## Context
The analytics service must store rollup aggregates queried by dashboard pages
at p95 latency under 100ms across three regions. Volume estimate is 10M items,
growing to 100M.

## Decision
Use Cosmos DB with the SQL API, partitioned by tenant id, with multi-region writes.

## Alternatives considered
- PostgreSQL with logical replication: rejected; multi-region writes are not
  first-class.
- Cosmos DB with the Mongo API: rejected; less mature on partition-key changes.
- Redis as primary store: rejected; durability gaps for our compliance regime.

## Consequences
- We accept higher per-operation cost vs Postgres in exchange for global writes.
- We must invest in partition-key design review before each new query.
- Local dev uses the Cosmos emulator; CI uses a containerised instance.

The ADR is short, scannable, and dated by its filename or front matter. Future-you (or a new teammate) opens it and understands within five minutes why the system looks the way it does. Without it, the choice gets re-litigated every year by someone who does not know the alternatives were considered.

Design docs are the heavier sibling — used for larger changes (new service, major refactor, architecture shift). They cover problem, goals, non-goals, proposed approach, alternatives, rollout plan, risks, open questions. They are reviewed before the code; they prevent the "build it then realise it doesn't fit" tax.

The discipline that pays off: write the document before merging the decision. After-the-fact docs are weaker because the writer has already committed and is justifying rather than evaluating. A doc written during the decision phase forces the engineer to articulate the alternatives, which often surfaces the better choice.

Good enough to ship

• ADR per non-trivial decision; lives in the repo
• Short, scannable, dated
• Alternatives explicitly listed

Expert tier

• Design docs for major changes; reviewed before code
• ADRs amended when superseded; status field updated
• Team has a culture of "show me the ADR" in review

Worked examples

Reading a new repo, the first 60 minutes

What to notice:

An hour of intentional reading gets you further than three hours of grepping.
The composition root + one trace + one test is the minimum viable map.
The local build proves your environment matches the repo's assumptions.

A hypothesis-driven debug session

Bug: customers report "save failed" intermittently on the orders page.
 
Hypothesis 1: race between client-side validation and server-side save.
  Experiment: add log on server save entry, replay with delayed client.
  Result: server logs show the save was never called. Refuted.
 
Hypothesis 2: client throws before sending. JavaScript error.
  Experiment: open browser DevTools, reproduce.
  Result: console shows "TypeError: cannot read property 'id' of undefined"
  in submit handler when newly-added line items have no id yet. Confirmed.
 
Fix: handle the no-id case in submit. Add a test for the empty-id path.
Verify: reproducer (saved network HAR) no longer fails.

What to notice:

The first hypothesis was wrong — but the experiment ruled it out cleanly.
The bug message ("save failed") was not what the bug was (client-side throw).
The test for the empty-id path is the durable artifact; future-you cannot regress.

A well-formed PR description

## Why
 
The /orders endpoint returns the order owner's email in plain text. Our
security review (ADR-012) requires PII never appears in API responses unless
the requesting principal owns the resource.
 
## What
 
This PR replaces `email` in the OrderResponse with `email_masked` for non-owner
callers; owners continue to receive the full email. The change is gated by a
new `OrderOwnerHandler` that uses resource-based authorisation.
 
## How to review
 
- Start with `OrderOwnerHandler.cs`: the authz logic.
- Then `OrdersController.GetById`: the masking decision.
- Then `OrderOwnerHandlerTests.cs` + `OrdersControllerTests.cs`: contract tests
  cover both cases.
 
## Trade-offs
 
- Masking is partial (`a***@b.com`), not full removal. Product wants users
  to see they have orders even if not the owner; full removal would prevent that.
- Performance: one extra DB read per request to fetch the order owner.
  Acceptable for the volume (~10 rps); revisit if traffic grows 10x.
 
## Out of scope
 
- Audit logging of masked-vs-unmasked responses. Tracked in #4271.
- Migration of historical logs. Tracked in #4272.

What to notice:

The reviewer knows why before they read the diff.
A reading order is suggested; respects the reviewer's time.
Trade-offs are surfaced, not buried.
Out-of-scope items get tickets; nothing falls through cracks.

An ADR for a small but durable choice

# ADR-014: Use System.Text.Json over Newtonsoft.Json in new code
 
## Status
Accepted, 2026-04-12
 
## Context
The codebase has historically used Newtonsoft.Json. Newer code is being added
to a service that ships allocations in the request path. System.Text.Json
offers significantly lower allocations and is the default in ASP.NET Core.
 
## Decision
New code uses System.Text.Json. Existing Newtonsoft code is left alone; we
migrate file-by-file when those files are otherwise modified.
 
## Alternatives considered
- Migrate everything at once: rejected; high churn, low isolated value.
- Keep Newtonsoft: rejected; foregoes ASP.NET Core's serializer optimisations.
 
## Consequences
- Engineers must know both libraries for the migration period.
- Polymorphic deserialisation requires different attributes (System.Text.Json
  is stricter by default — which is a security win, ref ADR-009).
- The team's linting rules add a warning when Newtonsoft is imported in new files.

What to notice:

Short, decisive, alternatives stated.
References other ADRs (security win); decisions compose.
Consequences include the cost (knowing both libraries during migration).

Hands-on exercises

Goal: Read a new (to you) open-source repo in 60 minutes. Steps: (1) Pick a repo. (2) Open it, find the composition root, trace one feature end to end, read one test, build it locally. (3) Write a one-page map of the repo. You're done when you can explain to a friend how the repo is structured without referring back.
Goal: Practice hypothesis-driven debugging on a known bug. Steps: (1) Find a recent bug in your team's tracker. (2) Before reading the fix, state three hypotheses about the root cause. (3) Read the fix; rate which hypothesis was closest. (4) Note what would have led you to the correct hypothesis faster. You're done when you can articulate the technique that would have shortened the investigation.
Goal: Translate an async stack trace. Steps: (1) Capture a real async stack trace from logs. (2) For each frame, identify whether it's your code or framework. (3) For your-code frames, identify the actual method (MoveNext → enclosing class). (4) Reconstruct the call path in plain English. You're done when you can read async traces as fluently as sync traces.
Goal: Profile and optimise one hot path. Steps: (1) Use BenchmarkDotNet or dotnet-trace to measure a candidate hot path. (2) Identify the dominant cost (allocation, CPU, IO). (3) Make one targeted optimisation. (4) Re-measure. (5) Decide whether the gain was worth the readability cost. You're done when you have data-driven evidence of the change's effect.
Goal: Split a 1000-line PR into three. Steps: (1) Find a large draft PR (yours or a teammate's). (2) Identify the logical splits (refactor first, then feature; introduce interface, then implementations). (3) Open three smaller PRs in dependency order. (4) Compare review experience. You're done when each PR is under 300 meaningful lines and reviewed in under an hour.
Goal: Write an ADR for a recent decision. Steps: (1) Pick a non-trivial choice your team made in the last month. (2) Write it up using the four-section format (status, context, decision, alternatives, consequences). (3) Open a PR adding it to docs/adr/. (4) Request review. You're done when the ADR is merged and a teammate has said it surfaced something they did not know.

Self-check questions

What is the composition root of an ASP.NET Core app? Of a React app?
Describe hypothesis-driven debugging in three sentences.
What is the first thing you do when handed a new repo you've never seen?
Why are tests good documentation, and what kind of test is best?
Translate MyApp.Services.OrderService.<GetAsync>d__7.MoveNext() back to the method it belongs to.
Why is "measure first" the rule for performance work?
Name three .NET tools for performance measurement and when to use each.
What's the upper bound on a reviewable PR size, and why?
What does an ADR contain that a comment in code cannot?
When do you write a design doc instead of an ADR?
What's the difference between rubber-stamping a review and approving one?
Why write the decision document before merging the decision?

High-signal resources

Official docs

The Twelve-Factor App — the structural conventions modern web apps share.
.NET diagnostics tools — dotnet-trace, dotnet-dump, dotnet-counters.
BenchmarkDotNet.
ADR template repo — Markdown ADR (MADR) format.

Books or courses

The Pragmatic Programmer — Hunt & Thomas. Twenty years old, still the canon.
A Philosophy of Software Design — John Ousterhout. Sharp on complexity and the cost of cleverness.
Designing Data-Intensive Applications — Martin Kleppmann. The system-design counterpart.

Practitioner posts

Will Larson's writings on engineering leadership — promotion, code review, decision docs.
Hillel Wayne on debugging — formal methods meets practical bug hunting.
Dan Luu on incidents and debugging — long, technical, full of receipts.
Joel on Software archives — older but the senior-skills posts age well.

Weekly milestones

Day 1: Read a new open-source repo in 60 minutes; write the one-page map. Answer self-check 1, 3, 4.
Day 2: Practice hypothesis-driven debugging on a recent bug; document the lesson. Answer self-check 2.
Day 3: Translate three real async stack traces. Profile one hot path with BenchmarkDotNet. Answer self-check 5, 6, 7.
Day 4-5: Split one large PR into three smaller ones. Self-review and write the description for each. Answer self-check 8, 11.
Day 6-7: Write one ADR for a recent decision; merge it. Read one design doc your team has produced; note what made it useful or not. Answer self-check 9, 10, 12.

How it shows up in the capstone

The capstone repo demonstrates these habits as artifacts. The docs/adr/ directory has decision records for every non-obvious choice (auth library, frontend framework, deployment topology). The README's "first 30 minutes" section is the onboarding map — it names the composition root, points at the first endpoint to read, and lists the integration test that documents the contract.

The PR history is small commits with substantive descriptions; the design docs in docs/design/ describe the few changes large enough to warrant one. The bug tracker links cite ADRs and design docs; the team's review comments reference them too.

When you find yourself opening a new project you have never seen and being productive within a day — finding the composition root, framing a hypothesis for the first bug, writing a PR description a reviewer thanks you for — the chapter has done its work. The mechanics from chapters 1-21 take you to the bar; this chapter is the floor of senior.

Previous chapter → Ch 21 — Testing strategy
Back to hub → From Novice to Fluent on the Modern Microsoft Web Stack