KQL (Kusto Query Language) Complete Guide: Syntax, Operators, Functions, Joins, Time Series, Anomaly Detection, and Real-World Query Patterns for Fabric Real-Time Intelligence
SQL queries tables with SELECT…FROM…WHERE. PySpark transforms DataFrames with filter, select, groupBy. KQL queries time-series and log data with a pipe-based syntax where data flows left to right through transformations — like Unix commands.
KQL is the query language for Fabric Eventhouse, KQL Databases, Azure Data Explorer, and Azure Monitor (Log Analytics). If you work with streaming data, IoT sensors, application logs, or real-time analytics in Fabric, KQL is your primary query tool.
Think of SQL as writing a formal letter — you structure it with clauses (SELECT, FROM, WHERE, GROUP BY) in a specific grammar. KQL is like giving verbal instructions — “take the sensor data, filter where temperature is above 100, group by device, show me the average.” Each instruction pipes into the next. The pipe-based approach is more intuitive for exploratory data analysis where you build queries step by step.
Table of Contents
- KQL vs SQL: The Fundamental Difference
- The Pipe Syntax
- Your First KQL Query
- Filtering Data
- where Operator
- Comparison Operators
- String Operators (contains, startswith, matches regex)
- Selecting and Renaming Columns
- project, project-away, project-rename
- extend (Add Calculated Columns)
- Sorting and Limiting
- order by, top, take
- Aggregation with summarize
- count, sum, avg, min, max
- dcount (Distinct Count)
- percentile and percentiles
- make_list, make_set
- arg_max, arg_min
- Time-Based Aggregation
- bin() for Time Buckets
- Tumbling Windows with bin()
- Moving Averages and Trends
- ago() for Relative Time
- Joins in KQL
- inner join, leftouter, rightouter, fullouter
- lookup (Optimized Dimension Join)
- union (Combine Tables)
- String Functions
- Date and Time Functions
- Numeric and Math Functions
- Dynamic (JSON) Functions
- Let Statements (Variables and Subqueries)
- Render (Inline Visualization)
- Advanced: make-series and Time Series
- Advanced: Anomaly Detection
- Advanced: Materialized Views
- Advanced: Stored Functions
- Real-World Query Patterns
- Pattern 1: Last Known Value per Device
- Pattern 2: Session Detection
- Pattern 3: Error Rate Over Time
- Pattern 4: Top N per Category
- Pattern 5: Funnel Analysis
- Pattern 6: Spike Detection
- Pattern 7: Compare This Week vs Last Week
- Pattern 8: Distributed Tracing (Follow a Request)
- KQL in Fabric: Where to Write and Run
- KQL vs SQL Complete Comparison Table
- Common Mistakes
- Interview Questions
- Wrapping Up
KQL vs SQL: The Fundamental Difference
SQL (clause-based — you declare the structure):
SELECT device_id, AVG(temperature) AS avg_temp
FROM sensor_readings
WHERE timestamp > '2026-06-01'
GROUP BY device_id
HAVING AVG(temperature) > 80
ORDER BY avg_temp DESC
KQL (pipe-based — data flows through transformations):
sensor_readings
| where timestamp > datetime(2026-06-01)
| summarize avg_temp = avg(temperature) by device_id
| where avg_temp > 80
| order by avg_temp desc
The key difference: SQL is declarative — you describe the RESULT you want. KQL is procedural — you describe the STEPS to get there. KQL reads top-to-bottom, each line transforming the data from the previous line.
The Pipe Syntax
table_name
| operator1
| operator2
| operator3
Data flows: table → operator1 → operator2 → operator3 → result
Each pipe (|) takes the output of the previous step as input.
You build queries incrementally: run line 1, check output, add line 2, check, add line 3...
Your First KQL Query
// Select all from a table (like SELECT * FROM)
sensor_readings
// Filter rows
sensor_readings
| where temperature > 100
// Filter + select columns
sensor_readings
| where temperature > 100
| project device_id, temperature, timestamp
// Filter + select + sort
sensor_readings
| where temperature > 100
| project device_id, temperature, timestamp
| order by temperature desc
// Filter + aggregate
sensor_readings
| where timestamp > ago(1h)
| summarize avg_temp = avg(temperature) by device_id
| order by avg_temp desc
Filtering Data
where Operator
// Basic comparison
sensor_readings
| where temperature > 100
// Multiple conditions (AND is implicit with multiple where, or use 'and')
sensor_readings
| where temperature > 100
| where device_id == "sensor-042"
// Same as:
sensor_readings
| where temperature > 100 and device_id == "sensor-042"
// OR condition
sensor_readings
| where temperature > 100 or humidity > 90
// NOT
sensor_readings
| where not(status == "inactive")
// NULL check
sensor_readings
| where isnotnull(temperature)
| where isempty(error_message) // Empty string or null
| where isnotempty(error_message) // Has a value
Comparison Operators
== Equal (case-sensitive for strings)
!= Not equal
> Greater than
< Less than
>= Greater or equal
<= Less or equal
between Range: | where temperature between (50 .. 100)
!between Not in range
in In a set: | where status in ("active", "pending")
!in Not in a set
String Operators
// Case-sensitive (default)
| where name == "Naveen" // Exact match
| where name contains "ave" // Substring anywhere
| where name startswith "Nav" // Starts with
| where name endswith "een" // Ends with
| where name matches regex "^Nav.*n$" // Regex match
// Case-insensitive versions (add ~)
| where name =~ "naveen" // Case-insensitive equals
| where name contains_cs "ave" // Explicit case-sensitive
| where name !contains "error" // Does NOT contain
| where name has "sensor" // Word-level match (faster than contains)
| where name has_any ("error", "warning", "critical") // Has any of these words
// has vs contains:
// has = word boundary match ("sensor" matches "sensor-042" but not "biosensor")
// contains = substring match ("sensor" matches both "sensor-042" and "biosensor")
// has is FASTER because it uses the term index
Performance tip: Always prefer has over contains when possible. has uses the inverted text index (fast), while contains scans every string (slow).
Selecting and Renaming Columns
project (Select Columns)
// Select specific columns (like SQL SELECT)
sensor_readings
| project device_id, temperature, timestamp
// Select and rename
sensor_readings
| project DeviceID = device_id, Temp = temperature, Time = timestamp
// Remove specific columns (keep everything else)
sensor_readings
| project-away internal_id, debug_flag, raw_payload
// Rename without changing selection
sensor_readings
| project-rename DeviceID = device_id, Temp = temperature
// Reorder columns (first listed columns appear first)
sensor_readings
| project timestamp, device_id, temperature, humidity
extend (Add Calculated Columns)
// Add new columns (keeps all existing columns)
sensor_readings
| extend temp_fahrenheit = temperature * 9.0 / 5.0 + 32.0
| extend is_critical = temperature > 100
| extend reading_age_hours = datetime_diff("hour", now(), timestamp)
// Conditional column (like CASE WHEN in SQL)
sensor_readings
| extend alert_level = case(
temperature > 150, "CRITICAL",
temperature > 100, "WARNING",
temperature > 80, "ELEVATED",
"NORMAL")
// Overwrite an existing column
sensor_readings
| extend device_id = toupper(device_id)
Sorting and Limiting
// Sort (order by)
sensor_readings
| order by temperature desc // Descending
| order by timestamp asc // Ascending (default)
| order by device_id asc, temperature desc // Multi-column sort
// Top N (sort + limit in one step)
sensor_readings
| top 10 by temperature desc // Top 10 hottest readings
// Take (random N rows — no ordering, faster than top)
sensor_readings
| take 100 // Any 100 rows (for quick exploration)
// Distinct values
sensor_readings
| distinct device_id // Unique device IDs
| distinct device_id, zone // Unique combinations
Aggregation with summarize
Core Aggregation Functions
// count, sum, avg, min, max
sensor_readings
| where timestamp > ago(24h)
| summarize
total_readings = count(),
avg_temp = avg(temperature),
max_temp = max(temperature),
min_temp = min(temperature),
total_energy = sum(power_consumption)
by device_id
// Without 'by' — aggregates entire table
sensor_readings
| where timestamp > ago(1h)
| summarize total = count(), avg_temp = avg(temperature)
dcount (Distinct Count)
// Count distinct values (approximate, very fast on billions of rows)
web_events
| where timestamp > ago(24h)
| summarize
unique_users = dcount(user_id),
unique_pages = dcount(page_url),
total_events = count()
// dcount with accuracy parameter (0-4, higher = more accurate but slower)
| summarize unique_users = dcount(user_id, 4) // Maximum accuracy
percentile and percentiles
// P50 (median), P95, P99 — essential for latency analysis
api_logs
| summarize
p50_latency = percentile(response_ms, 50),
p95_latency = percentile(response_ms, 95),
p99_latency = percentile(response_ms, 99)
by service_name
// Multiple percentiles at once
| summarize percentiles(response_ms, 50, 90, 95, 99) by service_name
make_list, make_set
// Collect values into arrays
orders
| summarize
products_ordered = make_list(product_name), // All values (with duplicates)
unique_products = make_set(product_name), // Unique values only
order_count = count()
by customer_id
arg_max, arg_min
// Get the row with the max/min value — extremely useful!
// "Latest reading per device" (most common pattern)
sensor_readings
| summarize arg_max(timestamp, *) by device_id
// Returns the entire row where timestamp is maximum, per device_id
// "First order per customer"
orders
| summarize arg_min(order_date, *) by customer_id
Time-Based Aggregation
bin() for Time Buckets
// 5-minute buckets
sensor_readings
| where timestamp > ago(24h)
| summarize avg_temp = avg(temperature)
by bin(timestamp, 5m) // 5-minute tumbling windows
| order by timestamp asc
// 1-hour buckets
sensor_readings
| summarize avg_temp = avg(temperature), count = count()
by bin(timestamp, 1h), device_id
// Daily buckets
sensor_readings
| summarize daily_avg = avg(temperature)
by bin(timestamp, 1d)
// Common bin sizes: 1m, 5m, 15m, 30m, 1h, 6h, 12h, 1d, 7d
ago() for Relative Time
// Relative time from now
| where timestamp > ago(1h) // Last 1 hour
| where timestamp > ago(30m) // Last 30 minutes
| where timestamp > ago(7d) // Last 7 days
| where timestamp > ago(90d) // Last 90 days
// Combine with datetime
| where timestamp > ago(24h) and timestamp < ago(1h) // 24h ago to 1h ago
| where timestamp > startofday(now()) // Since midnight today
| where timestamp > startofmonth(now()) // Since start of month
| where timestamp > startofweek(now()) // Since start of week
Joins in KQL
// Inner join
sensor_readings
| join kind=inner (device_metadata) on device_id
// Left outer join (keep all left rows)
sensor_readings
| join kind=leftouter (device_metadata) on device_id
// Right outer join
sensor_readings
| join kind=rightouter (device_metadata) on device_id
// Full outer join
sensor_readings
| join kind=fullouter (device_metadata) on device_id
// Anti join (rows in left NOT in right)
sensor_readings
| join kind=leftanti (known_devices) on device_id
// Returns readings from UNKNOWN devices
// Semi join (rows in left that HAVE a match in right, but no right columns)
sensor_readings
| join kind=leftsemi (active_devices) on device_id
// Returns readings from active devices only (no device_metadata columns added)
// Join on different column names
orders
| join kind=inner (customers | project cust_id, cust_name) on $left.customer_id == $right.cust_id
lookup (Optimized Dimension Join)
// lookup is an optimized join for small dimension tables
sensor_readings
| lookup kind=leftouter device_metadata on device_id
// Same as left outer join but faster for small right-side tables
// Use lookup for dimension tables (thousands of rows)
// Use join for large table-to-table joins
union (Combine Tables)
// Combine rows from multiple tables (like SQL UNION ALL)
sensor_readings_zone_a
| union sensor_readings_zone_b
| union sensor_readings_zone_c
// Union with specific columns
sensor_readings_zone_a
| union sensor_readings_zone_b
| project device_id, temperature, timestamp, zone = "combined"
String Functions
// Case conversion
| extend lower_name = tolower(name)
| extend upper_name = toupper(name)
// Trimming
| extend trimmed = trim(" ", raw_name)
| extend trim_start = trim_start(" ", name)
| extend trim_end = trim_end(" ", name)
// Substring
| extend first_5 = substring(name, 0, 5) // Characters 0-4
| extend from_pos_3 = substring(name, 3) // From position 3 to end
// Length
| extend name_length = strlen(name)
// Replace
| extend cleaned = replace_string(phone, "-", "")
| extend fixed = replace_regex(text, @"\d+", "NUM") // Regex replace
// Split and extract
| extend parts = split(csv_field, ",") // Split into array
| extend first_part = split(csv_field, ",")[0] // First element
| extend domain = split(email, "@")[1] // "gmail.com"
// Extract with regex
| extend ip = extract(@"(\d+\.\d+\.\d+\.\d+)", 1, log_message)
| extend error_code = extract(@"ERROR-(\d+)", 1, message)
// Concatenation
| extend full_name = strcat(first_name, " ", last_name)
| extend url = strcat("https://api.example.com/device/", device_id)
// Contains check (returns bool)
| extend has_error = log_message contains "ERROR"
| extend is_critical = log_message has_any ("CRITICAL", "FATAL", "PANIC")
// URL and base64
| extend decoded = base64_decode_tostring(encoded_field)
| extend encoded = base64_encode_tostring(raw_field)
| extend parsed_url = parse_url("https://example.com/path?key=value")
Date and Time Functions
// Current time
| extend now = now()
| extend utc_now = now() // Always UTC in KQL
// Constructing dates
| extend specific_date = datetime(2026-06-05)
| extend specific_datetime = datetime(2026-06-05 14:30:00)
// Extracting parts
| extend year = datetime_part("year", timestamp)
| extend month = datetime_part("month", timestamp)
| extend day = datetime_part("day", timestamp)
| extend hour = datetime_part("hour", timestamp)
| extend minute = datetime_part("minute", timestamp)
| extend day_of_week = dayofweek(timestamp) // Returns timespan
| extend day_of_year = dayofyear(timestamp)
| extend week_of_year = week_of_year(timestamp)
// Start/end of periods
| extend start_of_day = startofday(timestamp)
| extend end_of_day = endofday(timestamp)
| extend start_of_month = startofmonth(timestamp)
| extend end_of_month = endofmonth(timestamp)
| extend start_of_year = startofyear(timestamp)
| extend start_of_week = startofweek(timestamp)
// Date arithmetic
| extend tomorrow = timestamp + 1d
| extend last_week = timestamp - 7d
| extend plus_3_hours = timestamp + 3h
// Date difference
| extend age_days = datetime_diff("day", now(), timestamp)
| extend age_hours = datetime_diff("hour", now(), timestamp)
| extend age_minutes = datetime_diff("minute", now(), timestamp)
// Formatting
| extend formatted = format_datetime(timestamp, "yyyy-MM-dd HH:mm:ss")
| extend date_only = format_datetime(timestamp, "yyyy-MM-dd")
| extend time_only = format_datetime(timestamp, "HH:mm")
// Relative time checks
| where timestamp > ago(1h)
| where timestamp between (ago(7d) .. now())
| where timestamp between (datetime(2026-01-01) .. datetime(2026-06-30))
Numeric and Math Functions
| extend rounded = round(temperature, 2) // Round to 2 decimals
| extend ceiling = ceiling(temperature) // Round up
| extend floor = floor(temperature) // Round down
| extend absolute = abs(change_value) // Absolute value
| extend power = pow(2, 10) // 2^10 = 1024
| extend square_root = sqrt(144) // 12
| extend log_val = log(100) // Natural log
| extend log10_val = log10(100) // Log base 10 = 2
| extend sign_val = sign(-42) // -1 (negative)
| extend min_val = min_of(a, b, c) // Minimum of values
| extend max_val = max_of(a, b, c) // Maximum of values
| extend clamped = max_of(0, min_of(100, temperature)) // Clamp 0-100
// Random
| extend random_val = rand() // 0.0 to 1.0
| extend random_int = toint(rand() * 100) // 0 to 99
Dynamic (JSON) Functions
// Parse JSON string into dynamic object
| extend parsed = parse_json(json_payload)
| extend device = tostring(parsed.device_id)
| extend temp = todouble(parsed.readings.temperature)
| extend tags = parsed.tags // Array
// Access nested fields
| extend city = tostring(parsed.address.city)
| extend first_tag = tostring(parsed.tags[0])
// Expand array to rows (like LATERAL FLATTEN)
| mv-expand tag = parsed.tags
| extend tag_str = tostring(tag)
// Create dynamic objects
| extend new_obj = pack("device", device_id, "temp", temperature)
| extend new_arr = pack_array(temperature, humidity, pressure)
// Check dynamic type
| where isnotnull(parsed.optional_field)
| extend field_type = gettype(parsed.some_field)
// Array operations
| extend arr_length = array_length(parsed.items)
| extend contains_x = array_index_of(parsed.tags, "critical") >= 0
| extend sorted = array_sort_asc(parsed.values)
| extend sliced = array_slice(parsed.items, 0, 3) // First 3 elements
Let Statements (Variables and Subqueries)
// Define variables
let threshold = 100;
let lookback = 24h;
let target_device = "sensor-042";
sensor_readings
| where timestamp > ago(lookback)
| where device_id == target_device
| where temperature > threshold
// Define a subquery
let high_temp_devices =
sensor_readings
| where timestamp > ago(1h)
| where temperature > 100
| distinct device_id;
sensor_readings
| where device_id in (high_temp_devices)
| summarize avg_temp = avg(temperature) by device_id
// Define a function
let CelsiusToFahrenheit = (c: real) { c * 9.0 / 5.0 + 32.0 };
sensor_readings
| extend temp_f = CelsiusToFahrenheit(temperature)
// Multiple lets
let start_time = ago(7d);
let end_time = now();
let min_readings = 100;
sensor_readings
| where timestamp between (start_time .. end_time)
| summarize reading_count = count(), avg_temp = avg(temperature) by device_id
| where reading_count >= min_readings
Render (Inline Visualization)
// Line chart (time series)
sensor_readings
| where timestamp > ago(24h)
| summarize avg_temp = avg(temperature) by bin(timestamp, 15m)
| render timechart
// Bar chart
sensor_readings
| summarize avg_temp = avg(temperature) by device_id
| top 10 by avg_temp
| render barchart
// Pie chart
web_events
| summarize count() by browser
| render piechart
// Column chart
sales
| summarize revenue = sum(amount) by month = format_datetime(order_date, "yyyy-MM")
| render columnchart
// Scatter plot
sensor_readings
| project temperature, humidity
| render scatterchart
// Area chart
sensor_readings
| summarize avg_temp = avg(temperature) by bin(timestamp, 1h)
| render areachart
// Stacked area chart (by category)
sensor_readings
| summarize avg_temp = avg(temperature) by bin(timestamp, 1h), zone
| render areachart
Advanced: make-series and Time Series
// Create a regular time series (fills gaps with defaults)
sensor_readings
| make-series avg_temp = avg(temperature) default = 0
on timestamp step 1h
by device_id
// Result: each device gets a regular array of hourly averages
// Missing hours are filled with 0 (the default)
// Time series with trend line
sensor_readings
| make-series avg_temp = avg(temperature)
on timestamp step 1h
by device_id
| extend (slope, intercept, r2, variance) = series_fit_line(avg_temp)
| project device_id, slope, r2
| where abs(slope) > 0.5 // Devices with significant trends
// Seasonal decomposition
sensor_readings
| make-series avg_temp = avg(temperature)
on timestamp step 1h
| extend (baseline, seasonal, trend, residual) = series_decompose(avg_temp)
// Forecast (predict future values)
sensor_readings
| make-series avg_temp = avg(temperature)
on timestamp step 1h
| extend forecast = series_decompose_forecast(avg_temp, 24) // Predict next 24 hours
Advanced: Anomaly Detection
// Automatic anomaly detection on time series
sensor_readings
| where timestamp > ago(7d)
| make-series temp = avg(temperature) on timestamp step 1h by device_id
| extend anomalies = series_decompose_anomalies(temp)
| mv-expand timestamp to typeof(datetime), temp to typeof(real), anomalies to typeof(int)
| where anomalies != 0
| project timestamp, device_id, temp, anomaly_type = iff(anomalies > 0, "spike", "dip")
// Anomaly detection returns:
// +1 = positive anomaly (unexpected spike)
// -1 = negative anomaly (unexpected dip)
// 0 = normal
// Dynamic threshold anomaly detection
sensor_readings
| make-series temp = avg(temperature) on timestamp step 1h
| extend (anomalies, score, baseline) = series_decompose_anomalies(temp, 1.5)
// 1.5 = sensitivity (lower = more anomalies detected, higher = fewer)
Advanced: Materialized Views
// Pre-computed aggregation that auto-refreshes
.create materialized-view HourlyDeviceStats on table sensor_readings
{
sensor_readings
| summarize
avg_temp = avg(temperature),
max_temp = max(temperature),
min_temp = min(temperature),
reading_count = count()
by bin(timestamp, 1h), device_id
}
// Query the materialized view (instant — no re-aggregation)
HourlyDeviceStats
| where timestamp > ago(7d)
| order by timestamp desc
// Manage materialized views
.show materialized-views // List all
.show materialized-view HourlyDeviceStats extents // Health check
.alter materialized-view HourlyDeviceStats disable // Pause
.alter materialized-view HourlyDeviceStats enable // Resume
.drop materialized-view HourlyDeviceStats // Delete
Advanced: Stored Functions
// Create a reusable function stored in the database
.create function DeviceAlerts(threshold: real, lookback: timespan) {
sensor_readings
| where timestamp > ago(lookback)
| where temperature > threshold
| summarize alert_count = count(), max_temp = max(temperature)
by device_id
| where alert_count > 5
| order by max_temp desc
}
// Call the function
DeviceAlerts(100, 24h)
// Function with default parameters
.create function ActiveDevices(lookback: timespan = 1h) {
sensor_readings
| where timestamp > ago(lookback)
| distinct device_id
| summarize active_count = count()
}
ActiveDevices() // Uses default 1h
ActiveDevices(7d) // Override to 7 days
Real-World Query Patterns
Pattern 1: Last Known Value per Device
// Most common IoT pattern: "What is each device's current state?"
sensor_readings
| summarize arg_max(timestamp, *) by device_id
| project device_id, temperature, humidity, timestamp
| order by device_id
Pattern 2: Session Detection
// Group user events into sessions (gap > 30 minutes = new session)
web_events
| order by user_id, timestamp asc
| extend prev_time = prev(timestamp), prev_user = prev(user_id)
| extend gap_minutes = datetime_diff("minute", timestamp, prev_time)
| extend new_session = iff(user_id != prev_user or gap_minutes > 30 or isnull(gap_minutes), 1, 0)
| extend session_id = row_cumsum(new_session)
| summarize
session_start = min(timestamp),
session_end = max(timestamp),
page_views = count(),
duration_minutes = datetime_diff("minute", max(timestamp), min(timestamp))
by user_id, session_id
Pattern 3: Error Rate Over Time
// Error rate as percentage, per 5-minute window
api_logs
| where timestamp > ago(24h)
| summarize
total = count(),
errors = countif(status_code >= 500),
error_rate = round(100.0 * countif(status_code >= 500) / count(), 2)
by bin(timestamp, 5m)
| render timechart
Pattern 4: Top N per Category
// Top 3 products per category by revenue
sales
| summarize revenue = sum(amount) by category, product_name
| partition by category (top 3 by revenue)
| order by category, revenue desc
Pattern 5: Funnel Analysis
// E-commerce conversion funnel
let viewed = web_events | where event = "product_view" | distinct user_id;
let carted = web_events | where event = "add_to_cart" | distinct user_id;
let checked = web_events | where event = "checkout" | distinct user_id;
let purchased = web_events | where event = "purchase" | distinct user_id;
print
step1_views = toscalar(viewed | count),
step2_carts = toscalar(carted | count),
step3_checkout = toscalar(checked | count),
step4_purchase = toscalar(purchased | count)
Pattern 6: Spike Detection
// Detect when a metric is 3x the recent average
sensor_readings
| where timestamp > ago(24h)
| summarize hourly_avg = avg(temperature) by bin(timestamp, 1h), device_id
| extend rolling_avg = avg_of(prev(hourly_avg, 1), prev(hourly_avg, 2), prev(hourly_avg, 3))
| where hourly_avg > rolling_avg * 3
| project timestamp, device_id, hourly_avg, rolling_avg, spike_ratio = hourly_avg / rolling_avg
Pattern 7: Compare This Week vs Last Week
// Side-by-side comparison
let this_week = sensor_readings | where timestamp > ago(7d)
| summarize this_avg = avg(temperature) by device_id;
let last_week = sensor_readings | where timestamp between (ago(14d) .. ago(7d))
| summarize last_avg = avg(temperature) by device_id;
this_week
| join kind=inner last_week on device_id
| extend change_pct = round(100.0 * (this_avg - last_avg) / last_avg, 1)
| order by change_pct desc
Pattern 8: Distributed Tracing
// Follow a single request across all microservices
app_logs
| where trace_id == "abc-123-def-456"
| order by timestamp asc
| project timestamp, service_name, operation, duration_ms, status
// Shows the full journey of one request through 20+ services
KQL in Fabric: Where to Write and Run
| Location | How to Access |
|---|---|
| KQL Queryset | Workspace → + New item → KQL Queryset → write and save KQL queries |
| KQL Database | Open Eventhouse → KQL Database → query editor |
| Real-Time Dashboard | Dashboard tile → write KQL as the tile query |
| Fabric Notebook | Use %%kql magic command in a notebook cell |
| Spark Notebook | spark.sql("SELECT * FROM kql_db.table") via cross-database query |
KQL vs SQL Complete Comparison Table
| SQL | KQL |
|---|---|
SELECT * |
table_name |
SELECT col1, col2 |
table \| project col1, col2 |
SELECT col1 AS alias |
table \| project alias = col1 |
SELECT *, col1*2 AS new_col |
table \| extend new_col = col1 * 2 |
WHERE col > 100 |
\| where col > 100 |
WHERE col LIKE '%text%' |
\| where col contains "text" |
WHERE col IN ('a','b') |
\| where col in ("a", "b") |
WHERE col IS NOT NULL |
\| where isnotnull(col) |
ORDER BY col DESC |
\| order by col desc |
TOP 10 / LIMIT 10 |
\| top 10 by col / \| take 10 |
GROUP BY col |
\| summarize ... by col |
COUNT(*) |
count() |
COUNT(DISTINCT col) |
dcount(col) |
AVG(col) |
avg(col) |
HAVING COUNT(*) > 5 |
\| summarize c=count() by col \| where c > 5 |
CASE WHEN...THEN...END |
case(cond1, val1, cond2, val2, default) |
CAST(col AS INT) |
toint(col) |
DATEPART(hour, dt) |
datetime_part("hour", dt) |
DATEDIFF(day, dt1, dt2) |
datetime_diff("day", dt2, dt1) |
GETDATE() / NOW() |
now() |
JOIN ON |
\| join kind=inner (table2) on key |
UNION ALL |
table1 \| union table2 |
| (no equivalent) | \| render timechart |
| (no equivalent) | \| make-series ... on timestamp step 1h |
| (no equivalent) | series_decompose_anomalies() |
Common Mistakes
-
Using
containsinstead ofhas—hasuses the term index and is much faster. Usecontainsonly when you need substring matching (not word boundaries). -
Not using
ago()for relative time filters — hardcoding dates (datetime(2026-06-01)) breaks over time. Useago(7d)for relative queries. -
Forgetting
bin()for time aggregation —summarize by timestampcreates one row per unique timestamp (millions!). Usebin(timestamp, 5m)to bucket. -
Not using
arg_maxfor “latest per group” — writing complex subqueries whensummarize arg_max(timestamp, *) by device_iddoes it in one line. -
Writing SQL in KQL —
SELECT * FROM table WHERE col > 5is SQL, not KQL. The equivalent istable | where col > 5. KQL uses pipes, not clauses. -
Not using
letfor readability — long KQL queries become unreadable withoutletstatements to define variables and subqueries. -
Scanning full tables without time filters — always add
| where timestamp > ago(...)early in the query. Without it, you scan billions of rows.
Interview Questions
Q: What is KQL and how does it differ from SQL? A: KQL (Kusto Query Language) is a pipe-based query language optimized for time-series and log data. SQL uses clause-based syntax (SELECT, FROM, WHERE, GROUP BY). KQL chains operators with pipes (|) where data flows left-to-right. KQL has built-in time series functions (make-series), anomaly detection (series_decompose_anomalies), and inline visualization (render) that SQL lacks.
Q: What is the difference between has and contains in KQL?
A: has performs word-level matching using the inverted term index — fast but matches whole words only (“sensor” matches “sensor-042” but not “biosensor”). contains performs substring matching — slower but finds the term anywhere (“sensor” matches both). Always prefer has for performance unless substring matching is required.
Q: How do you get the latest reading per device in KQL?
A: Use summarize arg_max(timestamp, *) by device_id. This returns the entire row where timestamp is maximum for each device_id. It is the most common IoT pattern in KQL — one line instead of complex subqueries.
Q: What are materialized views in KQL?
A: Pre-computed aggregations stored in the database that auto-refresh when source data changes. Created with .create materialized-view. They eliminate the need to re-aggregate billions of rows for frequently run queries. Essential for dashboards that need sub-second response on large datasets.
Wrapping Up
KQL is the language of real-time analytics in Fabric. The pipe-based syntax makes it intuitive for exploratory analysis — build queries one step at a time, check the output, add the next step. Combined with built-in time series analysis, anomaly detection, and inline visualization, KQL handles use cases that would require multiple tools in the SQL world.
Bookmark this post as your KQL reference — the function tables and query patterns cover 95% of what you will need in production.
Related posts: – Real-Time Intelligence – RTI Deep Dive – SQL Functions Reference – M Language Complete Guide
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.