data engineeringintermediate 32m read

MS Stack Ch 12 — Kusto / KQL

Kusto Query Language: pipeline syntax, summarize, joins, time-binning, materialized views, KQL injection defence. The language powering Azure Data Explorer, App Insights, Log Analytics, and most Microsoft telemetry.

Chapter 12 of From Novice to Fluent on the Modern Microsoft Web Stack — a 22-chapter self-study plan.

Why this chapter

KQL — Kusto Query Language — is the lingua franca of Microsoft telemetry. Azure Data Explorer, Application Insights, Log Analytics, Microsoft Defender, Sentinel, Azure Resource Graph, the Intune device inventory, and increasingly the internal logging stack at most Microsoft product groups all speak KQL. If you can read KQL fluently, you can answer "what is my service doing right now" in any of those surfaces; if you cannot, you are a passenger on whoever can.

Shipping-level KQL means writing queries that return the right answer in a reasonable time. Expert-level KQL means writing queries that survive a 100× traffic increase without falling over, defending the cluster against accidental DoS by your own dashboards, and knowing precisely when to push work into a materialized view versus a query-time summarize. The same eight operators carry you the whole way; the depth is in how you order them, how you join, and how you reason about the storage layer underneath.

You finish this chapter when you can pick up an unfamiliar App Insights workspace, write a query that pinpoints the worst-performing operation in the last hour, and explain to a teammate why your join order is small | join (large) on Key rather than the reverse.

Read pipeline-style

Trace any KQL query top-down through its `|` operators and predict each intermediate shape.

Aggregate confidently

Reach for `summarize`, `bin()`, percentiles and `arg_max` without looking them up.

Join without truncation

Filter before joining, put the small table on the left, choose the right `kind`.

Defend the cluster

Use parameterised queries, project early, avoid `*` scans on huge tables.

Pre-aggregate

Know when a materialized view or a saved function should replace an expensive ad-hoc query.

Inject-proof

Never concatenate user input into KQL; bind via `declare query_parameters` or SDK parameters.

Concepts and depth

The Kusto storage model

Kusto is a columnar, append-only, time-series-optimised analytical store. Every table is partitioned by an ingestion timestamp and physically organised into compressed column extents. Reads scan only the columns you reference and only the extents whose time range overlaps your filter. This is why "filter the time column first" is not a stylistic preference — it is the single optimisation that the engine fundamentally relies on.

Because storage is append-only, there is no UPDATE or DELETE in normal operation. New facts are appended; older facts are aged out by retention policy. "Latest state per entity" is reconstructed at query time with arg_max(TimeGenerated, *) by EntityId over the append log. If you come from OLTP-land, the shift is mental: stop thinking "rows that mutate" and start thinking "events that accumulate".

The engine evaluates a query as a streaming pipeline of operators. Each operator reads the table from the previous step, applies its transformation, and emits a new table downstream. There is no global query planner that re-orders your operators the way a SQL optimiser would: KQL trusts you to put where early, project early, and summarize as soon as the cardinality drops. Writing "good" KQL is largely writing it in cardinality-shrinking order.

Good enough to ship

• Treat the table as immutable; never look for UPDATE.
• Always filter TimeGenerated (or your time column) first.
• Use arg_max when you need "current state".

Expert tier

• Tune retention + caching policies per table.
• Design row-store-vs-column-store partition keys for your access pattern.
• Reason about extent merge cadence and its impact on query latency.

Pipeline syntax: the pipe is a tabular contract

Every KQL operator takes a table in and returns a table out. The | is not "and then run this command" — it is "feed the table on the left into the operator on the right". That single property is what makes KQL composable: any prefix of a query is itself a valid query that you can run for debugging.

Requests
| where TimeGenerated > ago(1h)
| where ResultCode startswith "5"
| summarize Count = count() by bin(TimeGenerated, 1m), Operation
| order by TimeGenerated desc

Read top-down: take the Requests table, keep only the last hour, keep only 5xx responses, aggregate to 1-minute buckets per operation, sort newest first. If anything looks wrong, comment out the trailing operators and inspect the intermediate result — the contract holds at every line.

The corollary is that the first three lines of every production-grade KQL query are almost always Table | where TimeGenerated > ago(...) | where <some_high_selectivity_predicate>. The cluster scans a fraction of the data and everything downstream is small.

Good enough to ship

• Always lead with a time filter.
• Comment out trailing operators to debug intermediate shapes.
• Project away columns you do not need before downstream operators.

Expert tier

• Reason about cardinality at each pipeline stage.
• Reorder safe-equivalent operators for fewer bytes scanned.
• Use .show queries to confirm CPU and memory per stage.

Core operators: where, project, project-away, extend, summarize, order by, top, take, distinct, count, getschema

These ten operators cover 95% of all KQL you will ever write. Internalise them and the rest of the language is sugar.

where <predicate> — row filter. Predicates are case-insensitive by default for strings; use =~ / !~ to be explicit.
project Col1, Col2 = Expr — select and rename columns; everything else is dropped.
project-away SecretField, OtherField — drop a few columns and keep the rest.
project-rename NewName = OldName — rename without losing other columns.
extend NewCol = Expr — add a computed column without dropping anything.
summarize Agg1 = ..., Agg2 = ... by GroupCol1, GroupCol2 — group + aggregate.
order by Col [asc|desc], Col2 [asc|desc] — sort.
top N by Col desc — fused "order by + take N"; faster than separate operators.
take N — first N rows with no ordering guarantee; useful only for debugging.
distinct Col1, Col2 — deduplicate on the listed columns.
count — single-row scalar; cheap when filtered, expensive when not.
getschema — dump the table's column names + types; first thing to run on any unfamiliar table.

// Quick reconnaissance on an unknown table
MyTable | getschema
 
// Selective sampling — never run `MyTable | take 10` blindly on a 1TB table
MyTable
| where TimeGenerated > ago(1h)
| take 10

Good enough to ship

• Write any combination of these ten without looking them up.
• project early to drop heavy string columns.
• top over order by + take.

Expert tier

• Know the column-store cost difference between project and extend.
• Use getschema as a habit before any unfamiliar table.
• Avoid distinct on high-cardinality columns; reach for dcount if approximate is fine.

Joins: kinds, key matching, and the small-on-large rule

KQL joins are tabular operators just like everything else. The form is LeftTable | join kind=<kind> (RightTable) on Key1, Key2. The supported kind values, in the order you actually reach for them:

inner — rows present in both tables. The default and the most common.
leftouter — every left row; right columns become null when no match. Used when you need to keep all events from the "primary" table and decorate with optional context.
leftsemi — left rows that have a match on the right; only left columns survive. Faster than inner when you do not need right's columns.
leftanti — left rows that do not have a match on the right. The canonical "find orphans" tool.
rightouter / rightsemi / rightanti — symmetric variants; rarely used because you can flip the operand order.
fullouter — every row from either side.

// Decorate request rows with the exception details when one was thrown
requests
| where timestamp > ago(1h)
| join kind=leftouter (
    exceptions
    | where timestamp > ago(1h)
    | project operation_Id, ExceptionType = type, ExceptionMessage = outerMessage
) on operation_Id

The single most important performance rule is small on the left, large on the right. The engine streams the left side and looks up the right side in a hash index built from the right's join key. If you put a billion-row table on the left and a thousand-row table on the right, the engine still builds the right's hash, then streams the billion rows through it — which is fine — but the default 500,000 row limit on the left side will silently truncate your result. Filter both sides first, put the small one on the left, and your numbers will match reality.

// ❌ Wrong: billion-row Requests on the left, no time filter; result silently truncated.
Requests
| join (
    InterestingOperations
) on Operation
 
// ✅ Right: time-bound both sides, small list on the left.
let interesting =
    InterestingOperations | where IsActive | project Operation;
interesting
| join kind=inner (
    Requests | where TimeGenerated > ago(1h) | project TimeGenerated, Operation, DurationMs
) on Operation

Good enough to ship

• Filter both sides before joining.
• Put the smaller side on the left.
• Default to kind=inner; reach for leftouter when you need to keep all events.

Expert tier

• Choose hint.strategy=shuffle vs broadcast based on join-key cardinality.
• Use leftsemi and leftanti for set-membership patterns.
• Know when union beats join (vertical stack vs lookup).

Set operators: union and union withsource

union T1, T2, ... stacks tables vertically. Columns that exist in any operand are unioned by name; missing columns are filled with null. This is how you write the canonical "end-to-end trace" query that walks across requests, dependencies, traces, and exceptions for a single correlation ID.

let opId = "00000000-0000-0000-0000-000000000000";
union requests, dependencies, traces, exceptions
| where operation_Id == opId
| order by timestamp asc

union withsource = SourceTable adds a SourceTable column that names which operand each row came from — invaluable when the same column name has different semantics across tables.

union withsource = SourceTable requests, dependencies, exceptions
| where operation_Id == "..."
| summarize Count = count() by SourceTable, name

Good enough to ship

• Use union for end-to-end traces across telemetry tables.
• Reach for withsource whenever column names overlap.
• Filter every operand before union — the engine fans out the predicate but it is clearer to do it yourself.

Expert tier

• Use union kind=outer for permissive column unification.
• Use union isfuzzy=true to tolerate missing tables across workspaces.
• Split a slow join into union + summarize when the join key is high-cardinality.

Variables: let and materialize()

let binds a name to either a scalar, a tabular expression, or a function. It is the closest thing KQL has to a programming construct, and the single biggest readability lever in any query longer than ten lines.

let LookbackHours = 24;
let HighLatencyMs = 5000;
let ErrorOperations =
    requests
    | where timestamp > ago(LookbackHours * 1h)
    | where success == false
    | summarize Errors = count() by name
    | top 10 by Errors desc
    | project name;
requests
| where timestamp > ago(LookbackHours * 1h)
| where name in (ErrorOperations)
| where duration > HighLatencyMs
| project timestamp, name, duration, operation_Id
| order by timestamp desc

A tabular let is re-evaluated every time it is referenced. If the same expensive sub-query is used in multiple places, wrap it in materialize():

let HotOps = materialize(
    requests
    | where timestamp > ago(1h)
    | summarize Hits = count() by name
    | where Hits > 1000
);
HotOps | top 10 by Hits desc;
requests | where name in ((HotOps | project name)) | summarize avg(duration) by name

materialize() evaluates the sub-query once, stores the result in memory for the rest of the query, and serves all subsequent references from that cached result. It is memory-bounded (the cluster will fail the query rather than spill), so use it on results you know are small.

Good enough to ship

• let every parameter at the top of the query.
• Name your sub-queries; queries longer than 20 lines need section names.
• Use materialize() for sub-queries referenced more than once.

Expert tier

• Quantify memory before materialize(); favour hint.materialized=true selectively.
• Parameterise dashboard tiles with let so reviewers can tweak one place.
• Build persisted functions out of named let blocks.

Aggregation functions

The aggregation vocabulary is small and well-known; depth lies in knowing which one to reach for.

count() — row count in a group.
countif(predicate) — conditional count; avoids the where + count + join pattern.
sum(Col), avg(Col), min(Col), max(Col) — the arithmetic four.
dcount(Col) — approximate distinct count (HyperLogLog under the hood, ~1.6% error). Cheap on billions of rows.
dcount(Col, accuracy) where accuracy is 0–4 trades memory for precision; dcount(Col, 4) is exact but expensive.
percentile(Col, 95) / percentiles(Col, 50, 95, 99) — distribution shape. Always include p99; averages lie.
make_set(Col) / make_list(Col) — collect group values into an array (deduped vs not). Bounded by default to 1M elements.
arg_max(SortCol, *) / arg_min(SortCol, *) — the row in each group with the max/min of SortCol, dragging the rest of the columns along. The canonical "latest state per entity" pattern.

// Latency distribution per operation
requests
| where timestamp > ago(1h)
| summarize
    Total = count(),
    Errors = countif(success == false),
    p50 = percentile(duration, 50),
    p95 = percentile(duration, 95),
    p99 = percentile(duration, 99)
  by name
| extend ErrorRate = todouble(Errors) / Total
| order by Total desc

// Latest known status per device, from an append-only event log
DeviceEvents
| summarize arg_max(TimeGenerated, *) by DeviceId
| project DeviceId, LastSeen = TimeGenerated, Status, FirmwareVersion

Good enough to ship

• Always include p95 + p99 alongside averages.
• Use countif over where + count + join.
• arg_max(Time, *) for "latest state" without thinking.

Expert tier

• Tune dcount(Col, accuracy) based on the memory budget.
• Know make_set upper bound and how to bypass it with hll/hll_merge.
• Combine arg_max with summarize hint.shufflekey for billion-row latest-state pivots.

Time-binning, date arithmetic, and time-window queries

bin(TimeGenerated, 1h) rounds the timestamp down to the nearest hour. Combined with summarize ... by bin(TimeGenerated, ...), it is the entire mechanism for building time-series tiles. The bin width is the resolution of your chart; pick deliberately: 1m for incident response, 5m for live dashboards, 1h for the last day, 1d for trend lines.

requests
| where timestamp > ago(7d)
| summarize Rps = count() / 300.0 by bin(timestamp, 5m), name
| render timechart

Date arithmetic uses signed timespan literals: ago(1h), ago(7d), now() - 30d. Comparisons work directly: where timestamp > ago(1h) and timestamp < ago(5m) carves out a window. The datetime() literal is the rare absolute reference: where timestamp > datetime(2026-01-01).

For sliding-window analytics — "rolling 95th percentile over the last 5 minutes, sampled every minute" — combine bin with series_decompose or make-series + series_stats:

requests
| where timestamp > ago(1d)
| make-series p95 = percentile(duration, 95) default = 0
  on timestamp step 1m by name
| render timechart

make-series produces one row per group with parallel arrays of timestamps and values, which is the right shape for any operator beginning with series_.

Good enough to ship

• Reach for bin() whenever a chart is needed.
• Match bin width to the chart's intended resolution.
• Use ago() for relative windows.

Expert tier

• Use make-series + series_decompose_anomalies for outlier detection.
• Pick bin widths that align to retention and caching boundaries.
• Stitch overlapping windows with series_iir for smoothing.

Pivots and unpivots: evaluate pivot

evaluate pivot(Col, agg) turns long-format rows into wide-format columns. Useful for cross-tabulating two dimensions, e.g. operations × regions:

requests
| where timestamp > ago(1d)
| summarize Total = count() by name, cloud_RoleName
| evaluate pivot(cloud_RoleName, sum(Total))

The result has one row per name and one column per distinct cloud_RoleName, with the summed counts in the cells. The inverse — wide-to-long — is evaluate narrow() (or a manual union over project per column).

Pivots are display-time conveniences; do them at the end of the pipeline after filtering and summarising, never before. The column set is computed at query time, so wide-format results can have unbounded width if the pivot column is high-cardinality.

Performance rules

Five rules carry you through 99% of "this query is slow" situations:

Filter on the indexed time column first. Every Kusto table is partitioned by time; where timestamp > ago(1h) is the single biggest cost reduction.
Project early. Drop heavy string columns (especially customDimensions, customMeasurements) before they cross a join or a summarize.
Summarise early. Aggregate to a small intermediate before joining or rendering. A summarize over a billion rows is cheaper than carrying a billion rows into the next operator.
Avoid * on huge tables. project * and arg_max(Time, *) pull every column; pick the columns you need.
Use top instead of order by + take. The engine has a fused operator that does not need to fully sort.

The diagnostic loop is: run the query with set query_results_progressive_enabled = true;, observe time and memory in .show queries, identify the most expensive operator, push a filter or a projection earlier, repeat.

Query injection and how to defend

KQL accepts dynamic input via SDK parameters and via the in-language declare query_parameters syntax. Never concatenate user input into a query string.

// ❌ INJECTION — never concatenate untrusted input
var q = $"requests | where name == '{userInput}'";
// userInput = "x' or 1==1 | take 1000000 //"

The attacker payload above turns the predicate into name == 'x' or 1==1, returns one million rows, and probably hangs your application thread waiting for the response. With slightly more imagination they can union other tables and exfiltrate data they should not see.

The defence is to declare parameters and bind values:

const string query = """
declare query_parameters(op:string, lookback:timespan);
requests
| where timestamp > ago(lookback)
| where name == op
| project timestamp, name, duration, success
""";
var props = new ClientRequestProperties();
props.SetParameter("op", userInput);
props.SetParameter("lookback", TimeSpan.FromHours(1));
using var reader = await client.ExecuteQueryAsync(database, query, props);

Other defences that compose with parameter binding:

Allow-list columns in project: never let the caller name the columns to return.
Escape string literals if you must build a string yourself: string.Replace("'", "''") is the minimum, but always prefer parameters.
Typed date literals: datetime('2026-01-01') rather than string concat.
Set query-level limits: set query_take_max_records = 10000; at the top of the query string forces a hard ceiling.

Good enough to ship

• Never concatenate user input.
• Use declare query_parameters or SDK parameters.
• Allow-list any column names the caller can choose.

Expert tier

• Set query_take_max_records and query_results_cache_max_age defaults.
• Use persisted functions instead of accepting raw KQL fragments.
• Layer Entra-ID role-based row filters via update policies.

Materialised views and persisted functions

A materialised view is a pre-aggregated table that the cluster keeps up to date as new data lands. They are the right tool for "I need the same 24-hour aggregation every five minutes from a dashboard" — instead of summarising the firehose 288 times a day, you summarise once at ingestion and query the small result.

.create materialized-view RequestsHourly on table requests
{
    requests
    | summarize Count = count(), p95 = percentile(duration, 95)
        by name, bin(timestamp, 1h)
}

Dashboards then query RequestsHourly directly. The cluster maintains the view incrementally; query latency is constant.

Persisted functions are saved KQL expressions that take parameters and return a table. They are the closest thing to "stored procedures" in KQL and the right unit of code reuse across teams.

.create-or-alter function with (folder = "dashboards")
    GetErrorRate(lookback:timespan = 1h, op:string = "*")
{
    requests
    | where timestamp > ago(lookback)
    | where op == "*" or name == op
    | summarize Total = count(), Errors = countif(success == false) by name
    | extend ErrorRate = todouble(Errors) / Total
}

Callers invoke them like any operator: GetErrorRate(1h, "Checkout") | top 10 by ErrorRate desc. Centralising business logic in functions stops every dashboard from inventing its own slightly-different definition of "error rate".

Good enough to ship

• Reach for a materialised view when the same expensive aggregation runs more than hourly.
• Wrap canonical metric definitions in persisted functions.
• Version the functions; treat them as code.

Expert tier

• Use update policies to denormalise at ingestion time.
• Choose between materialised views and aggregation tables based on query patterns.
• Build a function-versioning convention for cross-team consumers.

Worked examples

Example 1 — Production triage query

When pager fires, this is the first query you run:

let lookback = 1h;
let bucket = 1m;
let recent =
    requests
    | where timestamp > ago(lookback)
    | project timestamp, name, duration, success, resultCode, operation_Id, cloud_RoleName;
recent
| summarize
    Total = count(),
    Errors = countif(success == false),
    p95 = percentile(duration, 95),
    p99 = percentile(duration, 99)
  by bin(timestamp, bucket), cloud_RoleName
| extend ErrorRate = todouble(Errors) / Total
| order by timestamp desc

lookback and bucket at the top let you sweep window/resolution without editing the body.
recent is materialize-eligible if you reuse it; here we use it once so leave it lazy.
cloud_RoleName groups by service; widen or narrow as needed.
ErrorRate uses todouble to force a float divide; integer divide returns 0 for Errors < Total.
order by timestamp desc puts the freshest bucket at the top of the rendered table.

Example 2 — End-to-end trace reconstruction

Given a single correlation ID, walk across every telemetry table:

let opId = "abcd1234-5678-90ab-cdef-1234567890ab";
union withsource = SourceTable
    (requests | project timestamp, name, duration, success, resultCode, operation_Id),
    (dependencies | project timestamp, name, target, duration, success, operation_Id),
    (exceptions | project timestamp, type, outerMessage, operation_Id),
    (traces | project timestamp, severityLevel, message, operation_Id)
| where operation_Id == opId
| order by timestamp asc
| project timestamp, SourceTable, Name = coalesce(name, type, message), Detail = coalesce(target, outerMessage, "")

Pre-projecting each operand keeps union from carrying unused columns.
withsource records which table each row came from.
coalesce picks the first non-null of several columns — useful when the "interesting" field has a different name per table.
The whole pipeline is bounded by operation_Id equality, which the engine pushes down into the index lookup on each operand.

Example 3 — Latest state of a fleet from an event log

A device fleet emits events; you need "current firmware version per device":

let interestingFirmware = dynamic(["1.2.3", "1.2.4"]);
DeviceEvents
| where TimeGenerated > ago(7d)
| summarize arg_max(TimeGenerated, *) by DeviceId
| where FirmwareVersion in (interestingFirmware)
| project DeviceId, LastSeen = TimeGenerated, FirmwareVersion, Status, Region
| summarize Devices = count() by Region, FirmwareVersion
| evaluate pivot(FirmwareVersion, sum(Devices))

arg_max(TimeGenerated, *) collapses each device's event stream to a single "latest" row.
The in (...) clause filters after the collapse — putting it before would not change correctness but would prevent the canonical "latest first, then filter" reading.
evaluate pivot turns the long-format (Region, FirmwareVersion, Devices) into a wide table indexed by firmware columns.

Example 4 — Parameterised KQL from .NET with Managed Identity

Production .NET clients invoke ADX with parameter binding and a managed identity. Below is a minimal but production-ready scaffolding:

using Kusto.Data;
using Kusto.Data.Common;
using Kusto.Data.Net.Client;
 
const string cluster = "https://mycluster.kusto.windows.net";
const string database = "mydb";
 
var kcsb = new KustoConnectionStringBuilder(cluster)
    .WithAadSystemManagedIdentity();
using var client = KustoClientFactory.CreateCslQueryProvider(kcsb);
 
const string query = """
declare query_parameters(op:string, lookback:timespan = 1h);
requests
| where timestamp > ago(lookback)
| where name == op
| summarize Total = count(), Errors = countif(success == false)
""";
 
var props = new ClientRequestProperties();
props.ClientRequestId = $"web;{Activity.Current?.TraceId ?? Guid.NewGuid().ToString("N")}";
props.SetOption("query_take_max_records", 10000);
props.SetParameter("op", operationName);
props.SetParameter("lookback", TimeSpan.FromHours(1));
 
using var reader = await client.ExecuteQueryAsync(database, query, props);
while (reader.Read())
{
    var total = reader.GetInt64(0);
    var errors = reader.GetInt64(1);
    // ...
}

WithAadSystemManagedIdentity makes ADX trust the App Service's managed identity; no secrets in config.
ClientRequestId derived from the W3C trace ID correlates a slow query in .show queries to your application's request.
query_take_max_records is a guardrail against a parameter that accidentally returns a huge result.
Parameters are bound by name, never concatenated.

Hands-on exercises

Reconnaissance. Pick any App Insights workspace you have access to. Run requests | getschema, dependencies | getschema, exceptions | getschema, traces | getschema. Write down the columns you have not seen before.
- You are done when you can recite the four core App Insights tables and the canonical correlation columns (operation_Id, operation_ParentId).
Triage query. Adapt Example 1 above to your own service. Set the bucket to 1 minute, lookback to one hour, group by cloud_RoleName and name.
- You are done when the query returns in under three seconds and you can explain each operator to a teammate.
Join discipline. Build a query that joins requests to exceptions on operation_Id for the last 24h. Run it without time filters on either side and observe the truncation warning; then add filters to both sides and confirm row count parity.
- You are done when the post-filter result has stable row counts across re-runs.
Latest-state pivot. Pick an append-only table (e.g. AuditLogs or a custom event table). Use arg_max(TimeGenerated, *) by EntityId to get latest state, then evaluate pivot over a categorical column.
- You are done when the pivoted result has one row per entity and one column per category, with stable totals.
Persisted function. Wrap your triage query in a .create-or-alter function with lookback:timespan and bucket:timespan parameters. Invoke it from a notebook.
- You are done when other teammates can call Triage(1h, 1m) without copy-pasting the body.
Injection drill. Build a contrived ASP.NET endpoint that takes ?op=... and concatenates it into a KQL string. Hit it with ?op=x' or 1==1 //. Then rewrite using declare query_parameters.
- You are done when the parameterised version refuses the payload by treating it as a literal.

Self-check questions

Explain the Kusto pipeline contract in one sentence and give an example of comment-out debugging.
What's the default row limit on a KQL join, and which side does it apply to?
When would you reach for leftsemi over inner?
Why is arg_max(Time, *) the canonical "latest state per entity" pattern, and what is the cost trade-off vs maintaining a state table?
What is the difference between dcount(Col) and dcount(Col, 4)?
Why is take N dangerous to base a dashboard on, and what should you use instead?
Explain why materialize() can either fix or break a query.
Give two reasons to use let for query parameters beyond readability.
Walk through the steps to KQL-inject a vulnerable query and the corresponding defence.
When would you use a materialised view versus a persisted function?
What's the difference between union and join? When does union make more sense?
Why should you project heavy string columns away before they cross a summarize?

High-signal resources

Official docs

Kusto Query Language overview — the language reference; tutorial + best practices are the first two reads.
Azure Monitor log queries — copy-pasteable examples for App Insights and Log Analytics.
Kusto query best practices — the canonical performance checklist.
Materialized views — when and how.
Parameterised queries from SDK — the .NET client docs.

Books or courses

The Definitive Guide to KQL (Mark Morowczynski, Rod Trent, Matthew Zorich) — the most complete book-length treatment.
Pluralsight Kusto Query Language Fundamentals — solid four-hour intro if you prefer video.

Practitioner posts

Rod Trent's "Must Learn KQL" series — bite-sized weekly drills.
Kusto Detective Agency — gamified KQL puzzles; the fastest way to internalise the operators.
Azure Data Explorer team blog — release notes and performance deep-dives.

Weekly milestones

Day 1. Read the language overview + best practices. Run getschema on every table in a workspace you have access to. Self-check questions 1–3.
Day 2. Work through the eight core operators on real telemetry. Build the triage query (exercise 2). Self-check questions 4–6.
Day 3. Joins and union. Do exercise 3 and confirm the truncation behaviour. Self-check question 7.
Day 4-5. Aggregations, arg_max, pivots. Do exercises 4 and 5. Self-check questions 8–10.
Day 6-7. SDK from .NET, parameterised queries, injection drill. Do exercise 6 and wire a persisted function into a dashboard tile. Self-check questions 11–12.

How it shows up in the capstone

The capstone — a real-time analytics dashboard — uses Azure Data Explorer as the analytic backend. Every chart tile maps to a parameterised KQL persisted function: GetRpsByRegion(from:datetime, to:datetime, region:string), GetErrorRateByOperation(lookback:timespan), and so on. The API in chapter 5 invokes the cluster via the .NET SDK using the App Service's managed identity from chapter 14, with a ClientRequestId derived from the W3C trace ID introduced in chapter 15.

Heavy aggregations live in materialised views (RequestsHourly, ErrorRateByOperation5m); cheap ad-hoc queries run on the raw tables. Every dashboard parameter — operation name, region, lookback window — flows in as a bound parameter, never via string concatenation. The KQL injection drill from exercise 6 is part of the security review at the end of the project.

Previous chapter → Ch 11 — Highcharts Next chapter → Ch 13 — Azure App Service