KQL (Kusto Query Language) Complete Guide: Syntax, Operators, Functions, Joins, Time Series, Anomaly Detection, and Real-World Query Patterns for Fabric Real-Time Intelligence

SQL queries tables with SELECT…FROM…WHERE. PySpark transforms DataFrames with filter, select, groupBy. KQL queries time-series and log data with a pipe-based syntax where data flows left to right through transformations — like Unix commands.

KQL is the query language for Fabric Eventhouse, KQL Databases, Azure Data Explorer, and Azure Monitor (Log Analytics). If you work with streaming data, IoT sensors, application logs, or real-time analytics in Fabric, KQL is your primary query tool.

Think of SQL as writing a formal letter — you structure it with clauses (SELECT, FROM, WHERE, GROUP BY) in a specific grammar. KQL is like giving verbal instructions — “take the sensor data, filter where temperature is above 100, group by device, show me the average.” Each instruction pipes into the next. The pipe-based approach is more intuitive for exploratory data analysis where you build queries step by step.

KQL vs SQL: The Fundamental Difference
The Pipe Syntax
Your First KQL Query
Filtering Data
where Operator
Comparison Operators
String Operators (contains, startswith, matches regex)
Selecting and Renaming Columns
project, project-away, project-rename
extend (Add Calculated Columns)
Sorting and Limiting
order by, top, take
Aggregation with summarize
count, sum, avg, min, max
dcount (Distinct Count)
percentile and percentiles
make_list, make_set
arg_max, arg_min
Time-Based Aggregation
bin() for Time Buckets
Tumbling Windows with bin()
Moving Averages and Trends
ago() for Relative Time
Joins in KQL
inner join, leftouter, rightouter, fullouter
lookup (Optimized Dimension Join)
union (Combine Tables)
String Functions
Date and Time Functions
Numeric and Math Functions
Dynamic (JSON) Functions
Let Statements (Variables and Subqueries)
Render (Inline Visualization)
Advanced: make-series and Time Series
Advanced: Anomaly Detection
Advanced: Materialized Views
Advanced: Stored Functions
Real-World Query Patterns
Pattern 1: Last Known Value per Device
Pattern 2: Session Detection
Pattern 3: Error Rate Over Time
Pattern 4: Top N per Category
Pattern 5: Funnel Analysis
Pattern 6: Spike Detection
Pattern 7: Compare This Week vs Last Week
Pattern 8: Distributed Tracing (Follow a Request)
KQL in Fabric: Where to Write and Run
KQL vs SQL Complete Comparison Table
Common Mistakes
Interview Questions
Wrapping Up

KQL vs SQL: The Fundamental Difference

SQL (clause-based — you declare the structure):
  SELECT device_id, AVG(temperature) AS avg_temp
  FROM sensor_readings
  WHERE timestamp > '2026-06-01'
  GROUP BY device_id
  HAVING AVG(temperature) > 80
  ORDER BY avg_temp DESC

KQL (pipe-based — data flows through transformations):
  sensor_readings
  | where timestamp > datetime(2026-06-01)
  | summarize avg_temp = avg(temperature) by device_id
  | where avg_temp > 80
  | order by avg_temp desc

The key difference: SQL is declarative — you describe the RESULT you want. KQL is procedural — you describe the STEPS to get there. KQL reads top-to-bottom, each line transforming the data from the previous line.

The Pipe Syntax

table_name
| operator1
| operator2
| operator3

Data flows: table → operator1 → operator2 → operator3 → result

Each pipe (|) takes the output of the previous step as input.
You build queries incrementally: run line 1, check output, add line 2, check, add line 3...

Your First KQL Query

// Select all from a table (like SELECT * FROM)
sensor_readings

// Filter rows
sensor_readings
| where temperature > 100

// Filter + select columns
sensor_readings
| where temperature > 100
| project device_id, temperature, timestamp

// Filter + select + sort
sensor_readings
| where temperature > 100
| project device_id, temperature, timestamp
| order by temperature desc

// Filter + aggregate
sensor_readings
| where timestamp > ago(1h)
| summarize avg_temp = avg(temperature) by device_id
| order by avg_temp desc

Filtering Data

where Operator

// Basic comparison
sensor_readings
| where temperature > 100

// Multiple conditions (AND is implicit with multiple where, or use 'and')
sensor_readings
| where temperature > 100
| where device_id == "sensor-042"

// Same as:
sensor_readings
| where temperature > 100 and device_id == "sensor-042"

// OR condition
sensor_readings
| where temperature > 100 or humidity > 90

// NOT
sensor_readings
| where not(status == "inactive")

// NULL check
sensor_readings
| where isnotnull(temperature)
| where isempty(error_message)        // Empty string or null
| where isnotempty(error_message)     // Has a value

Comparison Operators

==        Equal (case-sensitive for strings)
!=        Not equal
>         Greater than
<         Less than
>=        Greater or equal
<=        Less or equal
between   Range: | where temperature between (50 .. 100)
!between  Not in range
in        In a set: | where status in ("active", "pending")
!in       Not in a set

String Operators (contains, startswith, matches regex)

// Case-sensitive (default)
| where name == "Naveen"                    // Exact match
| where name contains "ave"                 // Substring anywhere
| where name startswith "Nav"               // Starts with
| where name endswith "een"                 // Ends with
| where name matches regex "^Nav.*n$"       // Regex match

// Case-insensitive versions (add ~)
| where name =~ "naveen"                    // Case-insensitive equals
| where name contains_cs "ave"              // Explicit case-sensitive
| where name !contains "error"              // Does NOT contain
| where name has "sensor"                   // Word-level match (faster than contains)
| where name has_any ("error", "warning", "critical")  // Has any of these words

// has vs contains:
// has = word boundary match ("sensor" matches "sensor-042" but not "biosensor")
// contains = substring match ("sensor" matches both "sensor-042" and "biosensor")
// has is FASTER because it uses the term index

Performance tip: Always prefer has over contains when possible. has uses the inverted text index (fast), while contains scans every string (slow).

Selecting and Renaming Columns

project, project-away, project-rename

// Select specific columns (like SQL SELECT)
sensor_readings
| project device_id, temperature, timestamp

// Select and rename
sensor_readings
| project DeviceID = device_id, Temp = temperature, Time = timestamp

// Remove specific columns (keep everything else)
sensor_readings
| project-away internal_id, debug_flag, raw_payload

// Rename without changing selection
sensor_readings
| project-rename DeviceID = device_id, Temp = temperature

// Reorder columns (first listed columns appear first)
sensor_readings
| project timestamp, device_id, temperature, humidity

extend (Add Calculated Columns)

// Add new columns (keeps all existing columns)
sensor_readings
| extend temp_fahrenheit = temperature * 9.0 / 5.0 + 32.0
| extend is_critical = temperature > 100
| extend reading_age_hours = datetime_diff("hour", now(), timestamp)

// Conditional column (like CASE WHEN in SQL)
sensor_readings
| extend alert_level = case(
    temperature > 150, "CRITICAL",
    temperature > 100, "WARNING",
    temperature > 80,  "ELEVATED",
    "NORMAL")

// Overwrite an existing column
sensor_readings
| extend device_id = toupper(device_id)

Sorting and Limiting

order by, top, take

// Sort (order by)
sensor_readings
| order by temperature desc                 // Descending
| order by timestamp asc                    // Ascending (default)
| order by device_id asc, temperature desc  // Multi-column sort

// Top N (sort + limit in one step)
sensor_readings
| top 10 by temperature desc               // Top 10 hottest readings

// Take (random N rows — no ordering, faster than top)
sensor_readings
| take 100                                  // Any 100 rows (for quick exploration)

// Distinct values
sensor_readings
| distinct device_id                        // Unique device IDs
| distinct device_id, zone                  // Unique combinations

Aggregation with summarize

count, sum, avg, min, max

// count, sum, avg, min, max
sensor_readings
| where timestamp > ago(24h)
| summarize
    total_readings = count(),
    avg_temp = avg(temperature),
    max_temp = max(temperature),
    min_temp = min(temperature),
    total_energy = sum(power_consumption)
    by device_id

// Without 'by' — aggregates entire table
sensor_readings
| where timestamp > ago(1h)
| summarize total = count(), avg_temp = avg(temperature)

dcount (Distinct Count)

// Count distinct values (approximate, very fast on billions of rows)
web_events
| where timestamp > ago(24h)
| summarize
    unique_users = dcount(user_id),
    unique_pages = dcount(page_url),
    total_events = count()

// dcount with accuracy parameter (0-4, higher = more accurate but slower)
| summarize unique_users = dcount(user_id, 4)    // Maximum accuracy

percentile and percentiles

// P50 (median), P95, P99 — essential for latency analysis
api_logs
| summarize
    p50_latency = percentile(response_ms, 50),
    p95_latency = percentile(response_ms, 95),
    p99_latency = percentile(response_ms, 99)
    by service_name

// Multiple percentiles at once
| summarize percentiles(response_ms, 50, 90, 95, 99) by service_name

make_list, make_set

// Collect values into arrays
orders
| summarize
    products_ordered = make_list(product_name),       // All values (with duplicates)
    unique_products = make_set(product_name),          // Unique values only
    order_count = count()
    by customer_id

arg_max, arg_min

// Get the row with the max/min value — extremely useful!
// "Latest reading per device" (most common pattern)
sensor_readings
| summarize arg_max(timestamp, *) by device_id
// Returns the entire row where timestamp is maximum, per device_id

// "First order per customer"
orders
| summarize arg_min(order_date, *) by customer_id

Time-Based Aggregation

bin() for Time Buckets

// 5-minute buckets
sensor_readings
| where timestamp > ago(24h)
| summarize avg_temp = avg(temperature)
    by bin(timestamp, 5m)                  // 5-minute tumbling windows
| order by timestamp asc

// 1-hour buckets
sensor_readings
| summarize avg_temp = avg(temperature), count = count()
    by bin(timestamp, 1h), device_id

// Daily buckets
sensor_readings
| summarize daily_avg = avg(temperature)
    by bin(timestamp, 1d)

// Common bin sizes: 1m, 5m, 15m, 30m, 1h, 6h, 12h, 1d, 7d

Tumbling Windows with bin()

The bin() function creates non-overlapping tumbling windows — each event belongs to exactly one time bucket:

// 5-minute tumbling windows (non-overlapping buckets)
sensor_readings
| summarize avg_temp = avg(temperature), event_count = count()
    by bin(timestamp, 5m), device_id
// Each 5-minute bucket is independent — no overlap
// Event at 10:07 → belongs to 10:05-10:10 bucket only

// Tumbling windows are the most common pattern:
// Dashboard tile: "Average temperature per 5 minutes"
// Alert rule: "Count errors per 1-minute window"
// Report: "Daily revenue summary"

Moving Averages and Trends

// Moving average using prev()
sensor_readings
| order by device_id, timestamp asc
| extend moving_avg_3 = (temperature + prev(temperature, 1) + prev(temperature, 2)) / 3.0

// Using make-series for proper time series trend analysis
sensor_readings
| make-series avg_temp = avg(temperature) on timestamp step 1h by device_id
| extend (slope, intercept, r2, variance) = series_fit_line(avg_temp)
| project device_id, slope, r2
| where abs(slope) > 0.5     // Only devices with significant rising/falling trends

// Seasonal decomposition (separate trend from seasonality)
sensor_readings
| make-series avg_temp = avg(temperature) on timestamp step 1h
| extend (baseline, seasonal, trend, residual) = series_decompose(avg_temp)

ago() for Relative Time

// Relative time from now
| where timestamp > ago(1h)        // Last 1 hour
| where timestamp > ago(30m)       // Last 30 minutes
| where timestamp > ago(7d)        // Last 7 days
| where timestamp > ago(90d)       // Last 90 days

// Combine with datetime
| where timestamp > ago(24h) and timestamp < ago(1h)   // 24h ago to 1h ago
| where timestamp > startofday(now())                    // Since midnight today
| where timestamp > startofmonth(now())                  // Since start of month
| where timestamp > startofweek(now())                   // Since start of week

Joins in KQL

inner join, leftouter, rightouter, fullouter

// Inner join
sensor_readings
| join kind=inner (device_metadata) on device_id

// Left outer join (keep all left rows)
sensor_readings
| join kind=leftouter (device_metadata) on device_id

// Right outer join
sensor_readings
| join kind=rightouter (device_metadata) on device_id

// Full outer join
sensor_readings
| join kind=fullouter (device_metadata) on device_id

// Anti join (rows in left NOT in right)
sensor_readings
| join kind=leftanti (known_devices) on device_id
// Returns readings from UNKNOWN devices

// Semi join (rows in left that HAVE a match in right, but no right columns)
sensor_readings
| join kind=leftsemi (active_devices) on device_id
// Returns readings from active devices only (no device_metadata columns added)

// Join on different column names
orders
| join kind=inner (customers | project cust_id, cust_name) on $left.customer_id == $right.cust_id

lookup (Optimized Dimension Join)

// lookup is an optimized join for small dimension tables
sensor_readings
| lookup kind=leftouter device_metadata on device_id
// Same as left outer join but faster for small right-side tables
// Use lookup for dimension tables (thousands of rows)
// Use join for large table-to-table joins

union (Combine Tables)

// Combine rows from multiple tables (like SQL UNION ALL)
sensor_readings_zone_a
| union sensor_readings_zone_b
| union sensor_readings_zone_c

// Union with specific columns
sensor_readings_zone_a
| union sensor_readings_zone_b
| project device_id, temperature, timestamp, zone = "combined"

String Functions

// Case conversion
| extend lower_name = tolower(name)
| extend upper_name = toupper(name)

// Trimming
| extend trimmed = trim(" ", raw_name)
| extend trim_start = trim_start(" ", name)
| extend trim_end = trim_end(" ", name)

// Substring
| extend first_5 = substring(name, 0, 5)           // Characters 0-4
| extend from_pos_3 = substring(name, 3)             // From position 3 to end

// Length
| extend name_length = strlen(name)

// Replace
| extend cleaned = replace_string(phone, "-", "")
| extend fixed = replace_regex(text, @"\d+", "NUM")  // Regex replace

// Split and extract
| extend parts = split(csv_field, ",")               // Split into array
| extend first_part = split(csv_field, ",")[0]        // First element
| extend domain = split(email, "@")[1]                // "gmail.com"

// Extract with regex
| extend ip = extract(@"(\d+\.\d+\.\d+\.\d+)", 1, log_message)
| extend error_code = extract(@"ERROR-(\d+)", 1, message)

// Concatenation
| extend full_name = strcat(first_name, " ", last_name)
| extend url = strcat("https://api.example.com/device/", device_id)

// Contains check (returns bool)
| extend has_error = log_message contains "ERROR"
| extend is_critical = log_message has_any ("CRITICAL", "FATAL", "PANIC")

// URL and base64
| extend decoded = base64_decode_tostring(encoded_field)
| extend encoded = base64_encode_tostring(raw_field)
| extend parsed_url = parse_url("https://example.com/path?key=value")

Date and Time Functions

// Current time
| extend now = now()
| extend utc_now = now()                              // Always UTC in KQL

// Constructing dates
| extend specific_date = datetime(2026-06-05)
| extend specific_datetime = datetime(2026-06-05 14:30:00)

// Extracting parts
| extend year = datetime_part("year", timestamp)
| extend month = datetime_part("month", timestamp)
| extend day = datetime_part("day", timestamp)
| extend hour = datetime_part("hour", timestamp)
| extend minute = datetime_part("minute", timestamp)
| extend day_of_week = dayofweek(timestamp)           // Returns timespan
| extend day_of_year = dayofyear(timestamp)
| extend week_of_year = week_of_year(timestamp)

// Start/end of periods
| extend start_of_day = startofday(timestamp)
| extend end_of_day = endofday(timestamp)
| extend start_of_month = startofmonth(timestamp)
| extend end_of_month = endofmonth(timestamp)
| extend start_of_year = startofyear(timestamp)
| extend start_of_week = startofweek(timestamp)

// Date arithmetic
| extend tomorrow = timestamp + 1d
| extend last_week = timestamp - 7d
| extend plus_3_hours = timestamp + 3h

// Date difference
| extend age_days = datetime_diff("day", now(), timestamp)
| extend age_hours = datetime_diff("hour", now(), timestamp)
| extend age_minutes = datetime_diff("minute", now(), timestamp)

// Formatting
| extend formatted = format_datetime(timestamp, "yyyy-MM-dd HH:mm:ss")
| extend date_only = format_datetime(timestamp, "yyyy-MM-dd")
| extend time_only = format_datetime(timestamp, "HH:mm")

// Relative time checks
| where timestamp > ago(1h)
| where timestamp between (ago(7d) .. now())
| where timestamp between (datetime(2026-01-01) .. datetime(2026-06-30))

Numeric and Math Functions

| extend rounded = round(temperature, 2)             // Round to 2 decimals
| extend ceiling = ceiling(temperature)               // Round up
| extend floor = floor(temperature)                   // Round down
| extend absolute = abs(change_value)                 // Absolute value
| extend power = pow(2, 10)                           // 2^10 = 1024
| extend square_root = sqrt(144)                      // 12
| extend log_val = log(100)                           // Natural log
| extend log10_val = log10(100)                       // Log base 10 = 2
| extend sign_val = sign(-42)                         // -1 (negative)
| extend min_val = min_of(a, b, c)                    // Minimum of values
| extend max_val = max_of(a, b, c)                    // Maximum of values
| extend clamped = max_of(0, min_of(100, temperature)) // Clamp 0-100

// Random
| extend random_val = rand()                           // 0.0 to 1.0
| extend random_int = toint(rand() * 100)              // 0 to 99

Dynamic (JSON) Functions

// Parse JSON string into dynamic object
| extend parsed = parse_json(json_payload)
| extend device = tostring(parsed.device_id)
| extend temp = todouble(parsed.readings.temperature)
| extend tags = parsed.tags                            // Array

// Access nested fields
| extend city = tostring(parsed.address.city)
| extend first_tag = tostring(parsed.tags[0])

// Expand array to rows (like LATERAL FLATTEN)
| mv-expand tag = parsed.tags
| extend tag_str = tostring(tag)

// Create dynamic objects
| extend new_obj = pack("device", device_id, "temp", temperature)
| extend new_arr = pack_array(temperature, humidity, pressure)

// Check dynamic type
| where isnotnull(parsed.optional_field)
| extend field_type = gettype(parsed.some_field)

// Array operations
| extend arr_length = array_length(parsed.items)
| extend contains_x = array_index_of(parsed.tags, "critical") >= 0
| extend sorted = array_sort_asc(parsed.values)
| extend sliced = array_slice(parsed.items, 0, 3)     // First 3 elements

Let Statements (Variables and Subqueries)

// Define variables
let threshold = 100;
let lookback = 24h;
let target_device = "sensor-042";
sensor_readings
| where timestamp > ago(lookback)
| where device_id == target_device
| where temperature > threshold

// Define a subquery
let high_temp_devices =
    sensor_readings
    | where timestamp > ago(1h)
    | where temperature > 100
    | distinct device_id;
sensor_readings
| where device_id in (high_temp_devices)
| summarize avg_temp = avg(temperature) by device_id

// Define a function
let CelsiusToFahrenheit = (c: real) { c * 9.0 / 5.0 + 32.0 };
sensor_readings
| extend temp_f = CelsiusToFahrenheit(temperature)

// Multiple lets
let start_time = ago(7d);
let end_time = now();
let min_readings = 100;
sensor_readings
| where timestamp between (start_time .. end_time)
| summarize reading_count = count(), avg_temp = avg(temperature) by device_id
| where reading_count >= min_readings

Render (Inline Visualization)

// Line chart (time series)
sensor_readings
| where timestamp > ago(24h)
| summarize avg_temp = avg(temperature) by bin(timestamp, 15m)
| render timechart

// Bar chart
sensor_readings
| summarize avg_temp = avg(temperature) by device_id
| top 10 by avg_temp
| render barchart

// Pie chart
web_events
| summarize count() by browser
| render piechart

// Column chart
sales
| summarize revenue = sum(amount) by month = format_datetime(order_date, "yyyy-MM")
| render columnchart

// Scatter plot
sensor_readings
| project temperature, humidity
| render scatterchart

// Area chart
sensor_readings
| summarize avg_temp = avg(temperature) by bin(timestamp, 1h)
| render areachart

// Stacked area chart (by category)
sensor_readings
| summarize avg_temp = avg(temperature) by bin(timestamp, 1h), zone
| render areachart

Advanced: make-series and Time Series

// Create a regular time series (fills gaps with defaults)
sensor_readings
| make-series avg_temp = avg(temperature) default = 0
    on timestamp step 1h
    by device_id
// Result: each device gets a regular array of hourly averages
// Missing hours are filled with 0 (the default)

// Time series with trend line
sensor_readings
| make-series avg_temp = avg(temperature)
    on timestamp step 1h
    by device_id
| extend (slope, intercept, r2, variance) = series_fit_line(avg_temp)
| project device_id, slope, r2
| where abs(slope) > 0.5     // Devices with significant trends

// Seasonal decomposition
sensor_readings
| make-series avg_temp = avg(temperature)
    on timestamp step 1h
| extend (baseline, seasonal, trend, residual) = series_decompose(avg_temp)

// Forecast (predict future values)
sensor_readings
| make-series avg_temp = avg(temperature)
    on timestamp step 1h
| extend forecast = series_decompose_forecast(avg_temp, 24)  // Predict next 24 hours

Advanced: Anomaly Detection

// Automatic anomaly detection on time series
sensor_readings
| where timestamp > ago(7d)
| make-series temp = avg(temperature) on timestamp step 1h by device_id
| extend anomalies = series_decompose_anomalies(temp)
| mv-expand timestamp to typeof(datetime), temp to typeof(real), anomalies to typeof(int)
| where anomalies != 0
| project timestamp, device_id, temp, anomaly_type = iff(anomalies > 0, "spike", "dip")

// Anomaly detection returns:
// +1 = positive anomaly (unexpected spike)
// -1 = negative anomaly (unexpected dip)
//  0 = normal

// Dynamic threshold anomaly detection
sensor_readings
| make-series temp = avg(temperature) on timestamp step 1h
| extend (anomalies, score, baseline) = series_decompose_anomalies(temp, 1.5)
// 1.5 = sensitivity (lower = more anomalies detected, higher = fewer)

Advanced: Materialized Views

// Pre-computed aggregation that auto-refreshes
.create materialized-view HourlyDeviceStats on table sensor_readings
{
    sensor_readings
    | summarize
        avg_temp = avg(temperature),
        max_temp = max(temperature),
        min_temp = min(temperature),
        reading_count = count()
        by bin(timestamp, 1h), device_id
}

// Query the materialized view (instant — no re-aggregation)
HourlyDeviceStats
| where timestamp > ago(7d)
| order by timestamp desc

// Manage materialized views
.show materialized-views                              // List all
.show materialized-view HourlyDeviceStats extents     // Health check
.alter materialized-view HourlyDeviceStats disable     // Pause
.alter materialized-view HourlyDeviceStats enable      // Resume
.drop materialized-view HourlyDeviceStats              // Delete

Advanced: Stored Functions

// Create a reusable function stored in the database
.create function DeviceAlerts(threshold: real, lookback: timespan) {
    sensor_readings
    | where timestamp > ago(lookback)
    | where temperature > threshold
    | summarize alert_count = count(), max_temp = max(temperature)
        by device_id
    | where alert_count > 5
    | order by max_temp desc
}

// Call the function
DeviceAlerts(100, 24h)

// Function with default parameters
.create function ActiveDevices(lookback: timespan = 1h) {
    sensor_readings
    | where timestamp > ago(lookback)
    | distinct device_id
    | summarize active_count = count()
}

ActiveDevices()       // Uses default 1h
ActiveDevices(7d)     // Override to 7 days

Real-World Query Patterns

Pattern 1: Last Known Value per Device

// Most common IoT pattern: "What is each device's current state?"
sensor_readings
| summarize arg_max(timestamp, *) by device_id
| project device_id, temperature, humidity, timestamp
| order by device_id

Pattern 2: Session Detection

// Group user events into sessions (gap > 30 minutes = new session)
web_events
| order by user_id, timestamp asc
| extend prev_time = prev(timestamp), prev_user = prev(user_id)
| extend gap_minutes = datetime_diff("minute", timestamp, prev_time)
| extend new_session = iff(user_id != prev_user or gap_minutes > 30 or isnull(gap_minutes), 1, 0)
| extend session_id = row_cumsum(new_session)
| summarize
    session_start = min(timestamp),
    session_end = max(timestamp),
    page_views = count(),
    duration_minutes = datetime_diff("minute", max(timestamp), min(timestamp))
    by user_id, session_id

Pattern 3: Error Rate Over Time

// Error rate as percentage, per 5-minute window
api_logs
| where timestamp > ago(24h)
| summarize
    total = count(),
    errors = countif(status_code >= 500),
    error_rate = round(100.0 * countif(status_code >= 500) / count(), 2)
    by bin(timestamp, 5m)
| render timechart

Pattern 4: Top N per Category

// Top 3 products per category by revenue
sales
| summarize revenue = sum(amount) by category, product_name
| partition by category (top 3 by revenue)
| order by category, revenue desc

Pattern 5: Funnel Analysis

// E-commerce conversion funnel
let viewed = web_events | where event = "product_view" | distinct user_id;
let carted = web_events | where event = "add_to_cart" | distinct user_id;
let checked = web_events | where event = "checkout" | distinct user_id;
let purchased = web_events | where event = "purchase" | distinct user_id;
print
    step1_views = toscalar(viewed | count),
    step2_carts = toscalar(carted | count),
    step3_checkout = toscalar(checked | count),
    step4_purchase = toscalar(purchased | count)

Pattern 6: Spike Detection

// Detect when a metric is 3x the recent average
sensor_readings
| where timestamp > ago(24h)
| summarize hourly_avg = avg(temperature) by bin(timestamp, 1h), device_id
| extend rolling_avg = avg_of(prev(hourly_avg, 1), prev(hourly_avg, 2), prev(hourly_avg, 3))
| where hourly_avg > rolling_avg * 3
| project timestamp, device_id, hourly_avg, rolling_avg, spike_ratio = hourly_avg / rolling_avg

Pattern 7: Compare This Week vs Last Week

// Side-by-side comparison
let this_week = sensor_readings | where timestamp > ago(7d)
    | summarize this_avg = avg(temperature) by device_id;
let last_week = sensor_readings | where timestamp between (ago(14d) .. ago(7d))
    | summarize last_avg = avg(temperature) by device_id;
this_week
| join kind=inner last_week on device_id
| extend change_pct = round(100.0 * (this_avg - last_avg) / last_avg, 1)
| order by change_pct desc

Pattern 8: Distributed Tracing (Follow a Request)

// Follow a single request across all microservices
app_logs
| where trace_id == "abc-123-def-456"
| order by timestamp asc
| project timestamp, service_name, operation, duration_ms, status
// Shows the full journey of one request through 20+ services

KQL in Fabric: Where to Write and Run

Location	How to Access
KQL Queryset	Workspace → + New item → KQL Queryset → write and save KQL queries
KQL Database	Open Eventhouse → KQL Database → query editor
Real-Time Dashboard	Dashboard tile → write KQL as the tile query
Fabric Notebook	Use `%%kql` magic command in a notebook cell
Spark Notebook	`spark.sql("SELECT * FROM kql_db.table")` via cross-database query

KQL vs SQL Complete Comparison Table

SQL	KQL
`SELECT *`	`table_name`
`SELECT col1, col2`	`table \\| project col1, col2`
`SELECT col1 AS alias`	`table \\| project alias = col1`
`SELECT , col12 AS new_col`	`table \\| extend new_col = col1 * 2`
`WHERE col > 100`	`\\| where col > 100`
`WHERE col LIKE '%text%'`	`\\| where col contains "text"`
`WHERE col IN ('a','b')`	`\\| where col in ("a", "b")`
`WHERE col IS NOT NULL`	`\\| where isnotnull(col)`
`ORDER BY col DESC`	`\\| order by col desc`
`TOP 10` / `LIMIT 10`	`\\| top 10 by col` / `\\| take 10`
`GROUP BY col`	`\\| summarize ... by col`
`COUNT(*)`	`count()`
`COUNT(DISTINCT col)`	`dcount(col)`
`AVG(col)`	`avg(col)`
`HAVING COUNT(*) > 5`	`\\| summarize c=count() by col \\| where c > 5`
`CASE WHEN...THEN...END`	`case(cond1, val1, cond2, val2, default)`
`CAST(col AS INT)`	`toint(col)`
`DATEPART(hour, dt)`	`datetime_part("hour", dt)`
`DATEDIFF(day, dt1, dt2)`	`datetime_diff("day", dt2, dt1)`
`GETDATE()` / `NOW()`	`now()`
`JOIN ON`	`\\| join kind=inner (table2) on key`
`UNION ALL`	`table1 \\| union table2`
(no equivalent)	`\\| render timechart`
(no equivalent)	`\\| make-series ... on timestamp step 1h`
(no equivalent)	`series_decompose_anomalies()`

Common Mistakes

Using contains instead of has — has uses the term index and is much faster. Use contains only when you need substring matching (not word boundaries).
Not using ago() for relative time filters — hardcoding dates (datetime(2026-06-01)) breaks over time. Use ago(7d) for relative queries.
Forgetting bin() for time aggregation — summarize by timestamp creates one row per unique timestamp (millions!). Use bin(timestamp, 5m) to bucket.
Not using arg_max for “latest per group” — writing complex subqueries when summarize arg_max(timestamp, *) by device_id does it in one line.
Writing SQL in KQL — SELECT * FROM table WHERE col > 5 is SQL, not KQL. The equivalent is table | where col > 5. KQL uses pipes, not clauses.
Not using let for readability — long KQL queries become unreadable without let statements to define variables and subqueries.
Scanning full tables without time filters — always add | where timestamp > ago(...) early in the query. Without it, you scan billions of rows.

Interview Questions

Q: What is KQL and how does it differ from SQL? A: KQL (Kusto Query Language) is a pipe-based query language optimized for time-series and log data. SQL uses clause-based syntax (SELECT, FROM, WHERE, GROUP BY). KQL chains operators with pipes (|) where data flows left-to-right. KQL has built-in time series functions (make-series), anomaly detection (series_decompose_anomalies), and inline visualization (render) that SQL lacks.

Q: What is the difference between has and contains in KQL? A: has performs word-level matching using the inverted term index — fast but matches whole words only (“sensor” matches “sensor-042” but not “biosensor”). contains performs substring matching — slower but finds the term anywhere (“sensor” matches both). Always prefer has for performance unless substring matching is required.

Q: How do you get the latest reading per device in KQL? A: Use summarize arg_max(timestamp, *) by device_id. This returns the entire row where timestamp is maximum for each device_id. It is the most common IoT pattern in KQL — one line instead of complex subqueries.

Q: What are materialized views in KQL? A: Pre-computed aggregations stored in the database that auto-refresh when source data changes. Created with .create materialized-view. They eliminate the need to re-aggregate billions of rows for frequently run queries. Essential for dashboards that need sub-second response on large datasets.

Wrapping Up

KQL is the language of real-time analytics in Fabric. The pipe-based syntax makes it intuitive for exploratory analysis — build queries one step at a time, check the output, add the next step. Combined with built-in time series analysis, anomaly detection, and inline visualization, KQL handles use cases that would require multiple tools in the SQL world.

Bookmark this post as your KQL reference — the function tables and query patterns cover 95% of what you will need in production.

Related posts: – Real-Time Intelligence – RTI Deep Dive – SQL Functions Reference – M Language Complete Guide

← Previous: RTI Deep Dive Fabric (28/38) Next: KQL Window Functions →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Table of Contents