· Topic
Data Engineering
Spark, Kafka, lakehouses and petabyte pipelines. Build data that engineers can trust.
Roadmap
stage 1
Foundations
stage 2
Core patterns
stage 3
Advanced
stage 4
In production
Articles
- intermediate 32m
MS Stack Ch 12 — Kusto / KQL
Kusto Query Language: pipeline syntax, summarize, joins, time-binning, materialized views, KQL injection defence. The language powering Azure Data Explorer, App Insights, Log Analytics, and most Microsoft telemetry.
- intermediate 12m
Observability for Data Pipelines — SLAs, SLOs, Freshness, Data Tests
Session 46 of the 48-session learning series.
- intermediate 12m
Change Data Capture — Debezium, Outbox Pattern, Snapshot+Stream
Session 40 of the 48-session learning series.
- intermediate 12m
Petabyte Cost Optimisation — Compression, Partitioning, Z-Order, File Sizing
Session 37 of the 48-session learning series.
- intermediate 12m
Data Governance — Lineage, Quality, Catalogs, Contracts, Observability
Session 32 of the 48-session learning series.
- intermediate 12m
Data Modelling — Dimensional, Data Vault, OBT for the Lakehouse
Session 28 of the 48-session learning series.
- intermediate 12m
Streaming with Flink/Spark — Watermarks, Windows, State
Session 23 of the 48-session learning series.
- intermediate 12m
Lakehouse — Delta Lake, Iceberg, Hudi, ACID on Object Storage
Session 19 of the 48-session learning series.
- intermediate 12m
Kafka Part 2 — Replication, ISR, Consumer Groups, Exactly-Once
Session 15 of the 48-session learning series.
- intermediate 12m
Kafka Part 1 — Brokers, Topics, Partitions, Producers
Session 10 of the 48-session learning series.
- intermediate 12m
Spark Part 2 — Shuffles, Catalyst, AQE, Tuning
Session 7 of the 48-session learning series.
- intermediate 12m
Spark Part 1 — Driver, Executors, RDDs, Lazy Evaluation
Session 2 of the 48-session learning series.
- beginner 5m
Kafka 101 for ML engineers
Topics, partitions, consumer groups — the parts of Kafka that actually matter when you put ML features behind it.
- advanced 16m
Taking the Azure Fabric Ignite Edition Challenges to Complete
Microsoft Learn Challenge conducting a challenge to get good in few of the challenges which are super useful to complete to gain knowledge on Microsoft Fabric.
- intermediate 15m
Exploring different services in GCP
Exploration and documentation of different services offered in GCP
- intermediate 11m
Exploring Azure Data Explorer and Best Practices
A self-sufficient deep-dive on Azure Data Explorer (ADX/Kusto) — architecture, the KQL language from zero to advanced, ingestion patterns, performance/cost levers, and operational best practices.