MS Stack Ch 12 — Kusto / KQL
Kusto Query Language: pipeline syntax, summarize, joins, time-binning, materialized views, KQL injection defence. The language powering Azure Data Explorer, App Insights, Log Analytics, and most Microsoft telemetry.
Chapter 12 of From Novice to Fluent on the Modern Microsoft Web Stack — a 22-chapter self-study plan.
Why this chapter
KQL — Kusto Query Language — is the lingua franca of Microsoft telemetry. Azure Data Explorer, Application Insights, Log Analytics, Microsoft Defender, Sentinel, Azure Resource Graph, the Intune device inventory, and increasingly the internal logging stack at most Microsoft product groups all speak KQL. If you can read KQL fluently, you can answer "what is my service doing right now" in any of those surfaces; if you cannot, you are a passenger on whoever can.
Shipping-level KQL means writing queries that return the right answer in a reasonable time. Expert-level KQL means writing queries that survive a 100× traffic increase without falling over, defending the cluster against accidental DoS by your own dashboards, and knowing precisely when to push work into a materialized view versus a query-time summarize. The same eight operators carry you the whole way; the depth is in how you order them, how you join, and how you reason about the storage layer underneath.
You finish this chapter when you can pick up an unfamiliar App Insights workspace, write a query that pinpoints the worst-performing operation in the last hour, and explain to a teammate why your join order is small | join (large) on Key rather than the reverse.
Concepts and depth
The Kusto storage model
Kusto is a columnar, append-only, time-series-optimised analytical store. Every table is partitioned by an ingestion timestamp and physically organised into compressed column extents. Reads scan only the columns you reference and only the extents whose time range overlaps your filter. This is why "filter the time column first" is not a stylistic preference — it is the single optimisation that the engine fundamentally relies on.
Because storage is append-only, there is no UPDATE or DELETE in normal operation. New facts are appended; older facts are aged out by retention policy. "Latest state per entity" is reconstructed at query time with arg_max(TimeGenerated, *) by EntityId over the append log. If you come from OLTP-land, the shift is mental: stop thinking "rows that mutate" and start thinking "events that accumulate".
The engine evaluates a query as a streaming pipeline of operators. Each operator reads the table from the previous step, applies its transformation, and emits a new table downstream. There is no global query planner that re-orders your operators the way a SQL optimiser would: KQL trusts you to put where early, project early, and summarize as soon as the cardinality drops. Writing "good" KQL is largely writing it in cardinality-shrinking order.
- • Treat the table as immutable; never look for
UPDATE. - • Always filter
TimeGenerated(or your time column) first. - • Use
arg_maxwhen you need "current state".
- • Tune retention + caching policies per table.
- • Design row-store-vs-column-store partition keys for your access pattern.
- • Reason about extent merge cadence and its impact on query latency.
Pipeline syntax: the pipe is a tabular contract
Every KQL operator takes a table in and returns a table out. The | is not "and then run this command" — it is "feed the table on the left into the operator on the right". That single property is what makes KQL composable: any prefix of a query is itself a valid query that you can run for debugging.
Requests
| where TimeGenerated > ago(1h)
| where ResultCode startswith "5"
| summarize Count = count() by bin(TimeGenerated, 1m), Operation
| order by TimeGenerated desc
Read top-down: take the Requests table, keep only the last hour, keep only 5xx responses, aggregate to 1-minute buckets per operation, sort newest first. If anything looks wrong, comment out the trailing operators and inspect the intermediate result — the contract holds at every line.
The corollary is that the first three lines of every production-grade KQL query are almost always Table | where TimeGenerated > ago(...) | where <some_high_selectivity_predicate>. The cluster scans a fraction of the data and everything downstream is small.
- • Always lead with a time filter.
- • Comment out trailing operators to debug intermediate shapes.
- • Project away columns you do not need before downstream operators.
- • Reason about cardinality at each pipeline stage.
- • Reorder safe-equivalent operators for fewer bytes scanned.
- • Use
.show queriesto confirm CPU and memory per stage.
Core operators: where, project, project-away, extend, summarize, order by, top, take, distinct, count, getschema
These ten operators cover 95% of all KQL you will ever write. Internalise them and the rest of the language is sugar.
where <predicate>— row filter. Predicates are case-insensitive by default for strings; use=~/!~to be explicit.project Col1, Col2 = Expr— select and rename columns; everything else is dropped.project-away SecretField, OtherField— drop a few columns and keep the rest.project-rename NewName = OldName— rename without losing other columns.extend NewCol = Expr— add a computed column without dropping anything.summarize Agg1 = ..., Agg2 = ... by GroupCol1, GroupCol2— group + aggregate.order by Col [asc|desc], Col2 [asc|desc]— sort.top N by Col desc— fused "order by + take N"; faster than separate operators.take N— first N rows with no ordering guarantee; useful only for debugging.distinct Col1, Col2— deduplicate on the listed columns.count— single-row scalar; cheap when filtered, expensive when not.getschema— dump the table's column names + types; first thing to run on any unfamiliar table.
// Quick reconnaissance on an unknown table
MyTable | getschema
// Selective sampling — never run `MyTable | take 10` blindly on a 1TB table
MyTable
| where TimeGenerated > ago(1h)
| take 10
- • Write any combination of these ten without looking them up.
- •
projectearly to drop heavy string columns. - •
topoverorder by + take.
- • Know the column-store cost difference between
projectandextend. - • Use
getschemaas a habit before any unfamiliar table. - • Avoid
distincton high-cardinality columns; reach fordcountif approximate is fine.
Joins: kinds, key matching, and the small-on-large rule
KQL joins are tabular operators just like everything else. The form is LeftTable | join kind=<kind> (RightTable) on Key1, Key2. The supported kind values, in the order you actually reach for them:
inner— rows present in both tables. The default and the most common.leftouter— every left row; right columns becomenullwhen no match. Used when you need to keep all events from the "primary" table and decorate with optional context.leftsemi— left rows that have a match on the right; only left columns survive. Faster thaninnerwhen you do not need right's columns.leftanti— left rows that do not have a match on the right. The canonical "find orphans" tool.rightouter/rightsemi/rightanti— symmetric variants; rarely used because you can flip the operand order.fullouter— every row from either side.
// Decorate request rows with the exception details when one was thrown
requests
| where timestamp > ago(1h)
| join kind=leftouter (
exceptions
| where timestamp > ago(1h)
| project operation_Id, ExceptionType = type, ExceptionMessage = outerMessage
) on operation_Id
The single most important performance rule is small on the left, large on the right. The engine streams the left side and looks up the right side in a hash index built from the right's join key. If you put a billion-row table on the left and a thousand-row table on the right, the engine still builds the right's hash, then streams the billion rows through it — which is fine — but the default 500,000 row limit on the left side will silently truncate your result. Filter both sides first, put the small one on the left, and your numbers will match reality.
// ❌ Wrong: billion-row Requests on the left, no time filter; result silently truncated.
Requests
| join (
InterestingOperations
) on Operation
// ✅ Right: time-bound both sides, small list on the left.
let interesting =
InterestingOperations | where IsActive | project Operation;
interesting
| join kind=inner (
Requests | where TimeGenerated > ago(1h) | project TimeGenerated, Operation, DurationMs
) on Operation
- • Filter both sides before joining.
- • Put the smaller side on the left.
- • Default to
kind=inner; reach forleftouterwhen you need to keep all events.
- • Choose
hint.strategy=shufflevsbroadcastbased on join-key cardinality. - • Use
leftsemiandleftantifor set-membership patterns. - • Know when
unionbeatsjoin(vertical stack vs lookup).
Set operators: union and union withsource
union T1, T2, ... stacks tables vertically. Columns that exist in any operand are unioned by name; missing columns are filled with null. This is how you write the canonical "end-to-end trace" query that walks across requests, dependencies, traces, and exceptions for a single correlation ID.
let opId = "00000000-0000-0000-0000-000000000000";
union requests, dependencies, traces, exceptions
| where operation_Id == opId
| order by timestamp asc
union withsource = SourceTable adds a SourceTable column that names which operand each row came from — invaluable when the same column name has different semantics across tables.
union withsource = SourceTable requests, dependencies, exceptions
| where operation_Id == "..."
| summarize Count = count() by SourceTable, name
- • Use
unionfor end-to-end traces across telemetry tables. - • Reach for
withsourcewhenever column names overlap. - • Filter every operand before
union— the engine fans out the predicate but it is clearer to do it yourself.
- • Use
union kind=outerfor permissive column unification. - • Use
union isfuzzy=trueto tolerate missing tables across workspaces. - • Split a slow
joinintounion + summarizewhen the join key is high-cardinality.
Variables: let and materialize()
let binds a name to either a scalar, a tabular expression, or a function. It is the closest thing KQL has to a programming construct, and the single biggest readability lever in any query longer than ten lines.
let LookbackHours = 24;
let HighLatencyMs = 5000;
let ErrorOperations =
requests
| where timestamp > ago(LookbackHours * 1h)
| where success == false
| summarize Errors = count() by name
| top 10 by Errors desc
| project name;
requests
| where timestamp > ago(LookbackHours * 1h)
| where name in (ErrorOperations)
| where duration > HighLatencyMs
| project timestamp, name, duration, operation_Id
| order by timestamp desc
A tabular let is re-evaluated every time it is referenced. If the same expensive sub-query is used in multiple places, wrap it in materialize():
let HotOps = materialize(
requests
| where timestamp > ago(1h)
| summarize Hits = count() by name
| where Hits > 1000
);
HotOps | top 10 by Hits desc;
requests | where name in ((HotOps | project name)) | summarize avg(duration) by name
materialize() evaluates the sub-query once, stores the result in memory for the rest of the query, and serves all subsequent references from that cached result. It is memory-bounded (the cluster will fail the query rather than spill), so use it on results you know are small.
- •
letevery parameter at the top of the query. - • Name your sub-queries; queries longer than 20 lines need section names.
- • Use
materialize()for sub-queries referenced more than once.
- • Quantify memory before
materialize(); favourhint.materialized=trueselectively. - • Parameterise dashboard tiles with
letso reviewers can tweak one place. - • Build persisted functions out of named
letblocks.
Aggregation functions
The aggregation vocabulary is small and well-known; depth lies in knowing which one to reach for.
count()— row count in a group.countif(predicate)— conditional count; avoids thewhere + count + joinpattern.sum(Col),avg(Col),min(Col),max(Col)— the arithmetic four.dcount(Col)— approximate distinct count (HyperLogLog under the hood, ~1.6% error). Cheap on billions of rows.dcount(Col, accuracy)whereaccuracyis 0–4 trades memory for precision;dcount(Col, 4)is exact but expensive.percentile(Col, 95)/percentiles(Col, 50, 95, 99)— distribution shape. Always include p99; averages lie.make_set(Col)/make_list(Col)— collect group values into an array (deduped vs not). Bounded by default to 1M elements.arg_max(SortCol, *)/arg_min(SortCol, *)— the row in each group with the max/min ofSortCol, dragging the rest of the columns along. The canonical "latest state per entity" pattern.
// Latency distribution per operation
requests
| where timestamp > ago(1h)
| summarize
Total = count(),
Errors = countif(success == false),
p50 = percentile(duration, 50),
p95 = percentile(duration, 95),
p99 = percentile(duration, 99)
by name
| extend ErrorRate = todouble(Errors) / Total
| order by Total desc
// Latest known status per device, from an append-only event log
DeviceEvents
| summarize arg_max(TimeGenerated, *) by DeviceId
| project DeviceId, LastSeen = TimeGenerated, Status, FirmwareVersion
- • Always include p95 + p99 alongside averages.
- • Use
countifoverwhere + count + join. - •
arg_max(Time, *)for "latest state" without thinking.
- • Tune
dcount(Col, accuracy)based on the memory budget. - • Know
make_setupper bound and how to bypass it withhll/hll_merge. - • Combine
arg_maxwithsummarize hint.shufflekeyfor billion-row latest-state pivots.
Time-binning, date arithmetic, and time-window queries
bin(TimeGenerated, 1h) rounds the timestamp down to the nearest hour. Combined with summarize ... by bin(TimeGenerated, ...), it is the entire mechanism for building time-series tiles. The bin width is the resolution of your chart; pick deliberately: 1m for incident response, 5m for live dashboards, 1h for the last day, 1d for trend lines.
requests
| where timestamp > ago(7d)
| summarize Rps = count() / 300.0 by bin(timestamp, 5m), name
| render timechart
Date arithmetic uses signed timespan literals: ago(1h), ago(7d), now() - 30d. Comparisons work directly: where timestamp > ago(1h) and timestamp < ago(5m) carves out a window. The datetime() literal is the rare absolute reference: where timestamp > datetime(2026-01-01).
For sliding-window analytics — "rolling 95th percentile over the last 5 minutes, sampled every minute" — combine bin with series_decompose or make-series + series_stats:
requests
| where timestamp > ago(1d)
| make-series p95 = percentile(duration, 95) default = 0
on timestamp step 1m by name
| render timechart
make-series produces one row per group with parallel arrays of timestamps and values, which is the right shape for any operator beginning with series_.
- • Reach for
bin()whenever a chart is needed. - • Match bin width to the chart's intended resolution.
- • Use
ago()for relative windows.
- • Use
make-series+series_decompose_anomaliesfor outlier detection. - • Pick bin widths that align to retention and caching boundaries.
- • Stitch overlapping windows with
series_iirfor smoothing.
Pivots and unpivots: evaluate pivot
evaluate pivot(Col, agg) turns long-format rows into wide-format columns. Useful for cross-tabulating two dimensions, e.g. operations × regions:
requests
| where timestamp > ago(1d)
| summarize Total = count() by name, cloud_RoleName
| evaluate pivot(cloud_RoleName, sum(Total))
The result has one row per name and one column per distinct cloud_RoleName, with the summed counts in the cells. The inverse — wide-to-long — is evaluate narrow() (or a manual union over project per column).
Pivots are display-time conveniences; do them at the end of the pipeline after filtering and summarising, never before. The column set is computed at query time, so wide-format results can have unbounded width if the pivot column is high-cardinality.
Performance rules
Five rules carry you through 99% of "this query is slow" situations:
- Filter on the indexed time column first. Every Kusto table is partitioned by time;
where timestamp > ago(1h)is the single biggest cost reduction. - Project early. Drop heavy string columns (especially
customDimensions,customMeasurements) before they cross a join or asummarize. - Summarise early. Aggregate to a small intermediate before joining or rendering. A
summarizeover a billion rows is cheaper than carrying a billion rows into the next operator. - Avoid
*on huge tables.project *andarg_max(Time, *)pull every column; pick the columns you need. - Use
topinstead oforder by + take. The engine has a fused operator that does not need to fully sort.
The diagnostic loop is: run the query with set query_results_progressive_enabled = true;, observe time and memory in .show queries, identify the most expensive operator, push a filter or a projection earlier, repeat.
Query injection and how to defend
KQL accepts dynamic input via SDK parameters and via the in-language declare query_parameters syntax. Never concatenate user input into a query string.
// ❌ INJECTION — never concatenate untrusted input
var q = $"requests | where name == '{userInput}'";
// userInput = "x' or 1==1 | take 1000000 //"
The attacker payload above turns the predicate into name == 'x' or 1==1, returns one million rows, and probably hangs your application thread waiting for the response. With slightly more imagination they can union other tables and exfiltrate data they should not see.
The defence is to declare parameters and bind values:
const string query = """
declare query_parameters(op:string, lookback:timespan);
requests
| where timestamp > ago(lookback)
| where name == op
| project timestamp, name, duration, success
""";
var props = new ClientRequestProperties();
props.SetParameter("op", userInput);
props.SetParameter("lookback", TimeSpan.FromHours(1));
using var reader = await client.ExecuteQueryAsync(database, query, props);
Other defences that compose with parameter binding:
- Allow-list columns in
project: never let the caller name the columns to return. - Escape string literals if you must build a string yourself:
string.Replace("'", "''")is the minimum, but always prefer parameters. - Typed date literals:
datetime('2026-01-01')rather than string concat. - Set query-level limits:
set query_take_max_records = 10000;at the top of the query string forces a hard ceiling.
- • Never concatenate user input.
- • Use
declare query_parametersor SDK parameters. - • Allow-list any column names the caller can choose.
- • Set
query_take_max_recordsandquery_results_cache_max_agedefaults. - • Use persisted functions instead of accepting raw KQL fragments.
- • Layer Entra-ID role-based row filters via update policies.
Materialised views and persisted functions
A materialised view is a pre-aggregated table that the cluster keeps up to date as new data lands. They are the right tool for "I need the same 24-hour aggregation every five minutes from a dashboard" — instead of summarising the firehose 288 times a day, you summarise once at ingestion and query the small result.
.create materialized-view RequestsHourly on table requests
{
requests
| summarize Count = count(), p95 = percentile(duration, 95)
by name, bin(timestamp, 1h)
}
Dashboards then query RequestsHourly directly. The cluster maintains the view incrementally; query latency is constant.
Persisted functions are saved KQL expressions that take parameters and return a table. They are the closest thing to "stored procedures" in KQL and the right unit of code reuse across teams.
.create-or-alter function with (folder = "dashboards")
GetErrorRate(lookback:timespan = 1h, op:string = "*")
{
requests
| where timestamp > ago(lookback)
| where op == "*" or name == op
| summarize Total = count(), Errors = countif(success == false) by name
| extend ErrorRate = todouble(Errors) / Total
}
Callers invoke them like any operator: GetErrorRate(1h, "Checkout") | top 10 by ErrorRate desc. Centralising business logic in functions stops every dashboard from inventing its own slightly-different definition of "error rate".
- • Reach for a materialised view when the same expensive aggregation runs more than hourly.
- • Wrap canonical metric definitions in persisted functions.
- • Version the functions; treat them as code.
- • Use update policies to denormalise at ingestion time.
- • Choose between materialised views and aggregation tables based on query patterns.
- • Build a function-versioning convention for cross-team consumers.
Worked examples
Example 1 — Production triage query
When pager fires, this is the first query you run:
let lookback = 1h;
let bucket = 1m;
let recent =
requests
| where timestamp > ago(lookback)
| project timestamp, name, duration, success, resultCode, operation_Id, cloud_RoleName;
recent
| summarize
Total = count(),
Errors = countif(success == false),
p95 = percentile(duration, 95),
p99 = percentile(duration, 99)
by bin(timestamp, bucket), cloud_RoleName
| extend ErrorRate = todouble(Errors) / Total
| order by timestamp desc
lookbackandbucketat the top let you sweep window/resolution without editing the body.recentismaterialize-eligible if you reuse it; here we use it once so leave it lazy.cloud_RoleNamegroups by service; widen or narrow as needed.ErrorRateusestodoubleto force a float divide; integer divide returns 0 forErrors < Total.order by timestamp descputs the freshest bucket at the top of the rendered table.
Example 2 — End-to-end trace reconstruction
Given a single correlation ID, walk across every telemetry table:
let opId = "abcd1234-5678-90ab-cdef-1234567890ab";
union withsource = SourceTable
(requests | project timestamp, name, duration, success, resultCode, operation_Id),
(dependencies | project timestamp, name, target, duration, success, operation_Id),
(exceptions | project timestamp, type, outerMessage, operation_Id),
(traces | project timestamp, severityLevel, message, operation_Id)
| where operation_Id == opId
| order by timestamp asc
| project timestamp, SourceTable, Name = coalesce(name, type, message), Detail = coalesce(target, outerMessage, "")
- Pre-projecting each operand keeps
unionfrom carrying unused columns. withsourcerecords which table each row came from.coalescepicks the first non-null of several columns — useful when the "interesting" field has a different name per table.- The whole pipeline is bounded by
operation_Idequality, which the engine pushes down into the index lookup on each operand.
Example 3 — Latest state of a fleet from an event log
A device fleet emits events; you need "current firmware version per device":
let interestingFirmware = dynamic(["1.2.3", "1.2.4"]);
DeviceEvents
| where TimeGenerated > ago(7d)
| summarize arg_max(TimeGenerated, *) by DeviceId
| where FirmwareVersion in (interestingFirmware)
| project DeviceId, LastSeen = TimeGenerated, FirmwareVersion, Status, Region
| summarize Devices = count() by Region, FirmwareVersion
| evaluate pivot(FirmwareVersion, sum(Devices))
arg_max(TimeGenerated, *)collapses each device's event stream to a single "latest" row.- The
in (...)clause filters after the collapse — putting it before would not change correctness but would prevent the canonical "latest first, then filter" reading. evaluate pivotturns the long-format(Region, FirmwareVersion, Devices)into a wide table indexed by firmware columns.
Example 4 — Parameterised KQL from .NET with Managed Identity
Production .NET clients invoke ADX with parameter binding and a managed identity. Below is a minimal but production-ready scaffolding:
using Kusto.Data;
using Kusto.Data.Common;
using Kusto.Data.Net.Client;
const string cluster = "https://mycluster.kusto.windows.net";
const string database = "mydb";
var kcsb = new KustoConnectionStringBuilder(cluster)
.WithAadSystemManagedIdentity();
using var client = KustoClientFactory.CreateCslQueryProvider(kcsb);
const string query = """
declare query_parameters(op:string, lookback:timespan = 1h);
requests
| where timestamp > ago(lookback)
| where name == op
| summarize Total = count(), Errors = countif(success == false)
""";
var props = new ClientRequestProperties();
props.ClientRequestId = $"web;{Activity.Current?.TraceId ?? Guid.NewGuid().ToString("N")}";
props.SetOption("query_take_max_records", 10000);
props.SetParameter("op", operationName);
props.SetParameter("lookback", TimeSpan.FromHours(1));
using var reader = await client.ExecuteQueryAsync(database, query, props);
while (reader.Read())
{
var total = reader.GetInt64(0);
var errors = reader.GetInt64(1);
// ...
}
WithAadSystemManagedIdentitymakes ADX trust the App Service's managed identity; no secrets in config.ClientRequestIdderived from the W3C trace ID correlates a slow query in.show queriesto your application's request.query_take_max_recordsis a guardrail against a parameter that accidentally returns a huge result.- Parameters are bound by name, never concatenated.
Hands-on exercises
-
Reconnaissance. Pick any App Insights workspace you have access to. Run
requests | getschema,dependencies | getschema,exceptions | getschema,traces | getschema. Write down the columns you have not seen before.- You are done when you can recite the four core App Insights tables and the canonical correlation columns (
operation_Id,operation_ParentId).
- You are done when you can recite the four core App Insights tables and the canonical correlation columns (
-
Triage query. Adapt Example 1 above to your own service. Set the bucket to 1 minute, lookback to one hour, group by
cloud_RoleNameandname.- You are done when the query returns in under three seconds and you can explain each operator to a teammate.
-
Join discipline. Build a query that joins
requeststoexceptionsonoperation_Idfor the last 24h. Run it without time filters on either side and observe the truncation warning; then add filters to both sides and confirm row count parity.- You are done when the post-filter result has stable row counts across re-runs.
-
Latest-state pivot. Pick an append-only table (e.g.
AuditLogsor a custom event table). Usearg_max(TimeGenerated, *) by EntityIdto get latest state, thenevaluate pivotover a categorical column.- You are done when the pivoted result has one row per entity and one column per category, with stable totals.
-
Persisted function. Wrap your triage query in a
.create-or-alter functionwithlookback:timespanandbucket:timespanparameters. Invoke it from a notebook.- You are done when other teammates can call
Triage(1h, 1m)without copy-pasting the body.
- You are done when other teammates can call
-
Injection drill. Build a contrived ASP.NET endpoint that takes
?op=...and concatenates it into a KQL string. Hit it with?op=x' or 1==1 //. Then rewrite usingdeclare query_parameters.- You are done when the parameterised version refuses the payload by treating it as a literal.
Self-check questions
- Explain the Kusto pipeline contract in one sentence and give an example of comment-out debugging.
- What's the default row limit on a KQL join, and which side does it apply to?
- When would you reach for
leftsemioverinner? - Why is
arg_max(Time, *)the canonical "latest state per entity" pattern, and what is the cost trade-off vs maintaining a state table? - What is the difference between
dcount(Col)anddcount(Col, 4)? - Why is
take Ndangerous to base a dashboard on, and what should you use instead? - Explain why
materialize()can either fix or break a query. - Give two reasons to use
letfor query parameters beyond readability. - Walk through the steps to KQL-inject a vulnerable query and the corresponding defence.
- When would you use a materialised view versus a persisted function?
- What's the difference between
unionandjoin? When doesunionmake more sense? - Why should you
projectheavy string columns away before they cross asummarize?
High-signal resources
Official docs
- Kusto Query Language overview — the language reference; tutorial + best practices are the first two reads.
- Azure Monitor log queries — copy-pasteable examples for App Insights and Log Analytics.
- Kusto query best practices — the canonical performance checklist.
- Materialized views — when and how.
- Parameterised queries from SDK — the .NET client docs.
Books or courses
- The Definitive Guide to KQL (Mark Morowczynski, Rod Trent, Matthew Zorich) — the most complete book-length treatment.
- Pluralsight Kusto Query Language Fundamentals — solid four-hour intro if you prefer video.
Practitioner posts
- Rod Trent's "Must Learn KQL" series — bite-sized weekly drills.
- Kusto Detective Agency — gamified KQL puzzles; the fastest way to internalise the operators.
- Azure Data Explorer team blog — release notes and performance deep-dives.
Weekly milestones
- Day 1. Read the language overview + best practices. Run
getschemaon every table in a workspace you have access to. Self-check questions 1–3. - Day 2. Work through the eight core operators on real telemetry. Build the triage query (exercise 2). Self-check questions 4–6.
- Day 3. Joins and
union. Do exercise 3 and confirm the truncation behaviour. Self-check question 7. - Day 4-5. Aggregations,
arg_max, pivots. Do exercises 4 and 5. Self-check questions 8–10. - Day 6-7. SDK from .NET, parameterised queries, injection drill. Do exercise 6 and wire a persisted function into a dashboard tile. Self-check questions 11–12.
How it shows up in the capstone
The capstone — a real-time analytics dashboard — uses Azure Data Explorer as the analytic backend. Every chart tile maps to a parameterised KQL persisted function: GetRpsByRegion(from:datetime, to:datetime, region:string), GetErrorRateByOperation(lookback:timespan), and so on. The API in chapter 5 invokes the cluster via the .NET SDK using the App Service's managed identity from chapter 14, with a ClientRequestId derived from the W3C trace ID introduced in chapter 15.
Heavy aggregations live in materialised views (RequestsHourly, ErrorRateByOperation5m); cheap ad-hoc queries run on the raw tables. Every dashboard parameter — operation name, region, lookback window — flows in as a bound parameter, never via string concatenation. The KQL injection drill from exercise 6 is part of the security review at the end of the project.
Previous chapter → Ch 11 — Highcharts Next chapter → Ch 13 — Azure App Service