Real-Time Analytics Deep Dive: Window Types, Accelerated Shortcuts, KQL Functions, Materialized Views, and Eventhouse Optimization

Real-Time Analytics Deep Dive: Window Types, Accelerated Shortcuts, KQL Functions, Materialized Views, and Eventhouse Optimization

Our Real-Time Intelligence post covered the architecture — Eventstream, Eventhouse, KQL basics, and dashboards. This post goes deeper: the five window types for time-based analysis, accelerated vs standard shortcuts in KQL databases, advanced KQL functions for dates, strings, and aggregations, materialized views for pre-computed aggregations, and Eventhouse optimization.

Table of Contents

  • The Five Window Types
  • Tumbling Windows
  • Hopping (Sliding) Windows
  • Sliding Windows
  • Session Windows
  • Snapshot Windows
  • Accelerated vs Standard Shortcuts in KQL DB
  • Creating Accelerated Shortcuts
  • Advanced KQL Functions
  • Date and Time Functions
  • String Functions
  • Aggregation Functions
  • Dynamic and JSON Parsing
  • Materialized Views in KQL
  • Creating and Managing Materialized Views
  • Eventhouse Optimization
  • Retention Policies
  • Caching Policies
  • Partitioning
  • Error Resolution in RTI
  • Eventstream Errors
  • Eventhouse Errors
  • Common Mistakes
  • Interview Questions
  • Wrapping Up

The Five Window Types

Tumbling Windows

Non-overlapping, fixed-size windows. Each event belongs to exactly ONE window.

// Tumbling window: 5-minute buckets
sensor_readings
| summarize avg_temp = avg(temperature), count = count()
    by bin(timestamp, 5m), device_id
|--5min--|--5min--|--5min--|
  No overlap. Each event in one window.

Hopping (Sliding) Windows

Fixed-size windows that OVERLAP by sliding at regular intervals.

// 10-minute window, hopping every 5 minutes
sensor_readings
| summarize avg_temp = avg(temperature)
    by bin(timestamp, 5m), device_id
| extend window_end = timestamp + 10m
// An event at 10:07 appears in both the 10:00-10:10 and 10:05-10:15 windows

Sliding Windows

Windows triggered by events — window starts when an event arrives, has a fixed duration.

// Sliding: events within 5 minutes of each other
sensor_readings
| summarize count() by device_id, bin(timestamp, 1m)
| where count_ > 10  // More than 10 events per minute = burst

Session Windows

Dynamic windows based on activity gaps — window opens when activity starts, closes after inactivity.

// Session: group events with < 5min gap between them
sensor_readings
| sort by device_id, timestamp asc
| extend gap = datetime_diff('minute', timestamp, prev(timestamp))
| extend new_session = iif(gap > 5 or isnull(gap), 1, 0)
| extend session_id = row_cumsum(new_session)
| summarize session_start = min(timestamp), session_end = max(timestamp),
            event_count = count() by device_id, session_id

Snapshot Windows

Captures the state at a specific point in time — not based on event time but on system time.

// Snapshot: state of all devices right now
sensor_readings
| summarize arg_max(timestamp, *) by device_id
// Returns the LATEST reading per device (current snapshot)

Accelerated vs Standard Shortcuts in KQL DB

Feature Standard Shortcut Accelerated Shortcut
Data location Reads from OneLake at query time Copies data into Eventhouse cache
Query speed Slower (reads from OneLake) Faster (data pre-loaded)
Freshness Always current Near real-time (minutes)
Use case Infrequent queries on large historical data Frequent queries needing fast response
Storage No duplication Duplicates data in Eventhouse

Creating Accelerated Shortcuts

  1. Open your KQL Database in the Eventhouse
  2. Click NewOneLake shortcut
  3. Select the source (Lakehouse table, another KQL DB)
  4. Toggle Acceleration to On
  5. Configure refresh interval (how often data syncs)
// Query accelerated shortcut (same syntax as regular table)
accelerated_sales_data
| where order_date > ago(7d)
| summarize revenue = sum(amount) by product
| top 10 by revenue
// Fast because data is pre-loaded into Eventhouse cache

DP-700 exam tip: Know when to choose accelerated (frequent queries, speed matters) vs standard (infrequent, storage matters).

Advanced KQL Functions

Date and Time Functions

// Current time
print now()

// Extract parts
sensor_readings
| extend year = datetime_part("year", timestamp),
         month = datetime_part("month", timestamp),
         hour = datetime_part("hour", timestamp),
         day_of_week = dayofweek(timestamp)

// Date arithmetic
| extend yesterday = timestamp - 1d,
         next_week = timestamp + 7d,
         age_hours = datetime_diff("hour", now(), timestamp)

// Start/end of periods
| extend start_of_month = startofmonth(timestamp),
         end_of_month = endofmonth(timestamp),
         start_of_day = startofday(timestamp)

// Format
| extend formatted = format_datetime(timestamp, "yyyy-MM-dd HH:mm")

// Between
| where timestamp between (datetime(2026-01-01) .. datetime(2026-06-01))

String Functions

// Common string operations
events
| extend lower_name = tolower(name),
         upper_name = toupper(name),
         trimmed = trim(" ", raw_name),
         contains_error = name contains "error",
         starts_with = name startswith "ERR",
         extracted = extract("device-(\d+)", 1, device_name),  // Regex
         split_parts = split(csv_field, ","),
         replaced = replace_string(phone, "-", ""),
         length = strlen(name),
         substring = substring(name, 0, 5)

Aggregation Functions

// Beyond basic count/sum/avg
sensor_readings
| summarize
    count = count(),
    distinct_devices = dcount(device_id),        // Distinct count
    percentile_95 = percentile(temperature, 95), // 95th percentile
    median_temp = percentile(temperature, 50),
    variance = variance(temperature),
    stdev = stdev(temperature),
    min_temp = min(temperature),
    max_temp = max(temperature),
    make_list = make_list(device_id),             // Array of all values
    make_set = make_set(device_id)                // Unique array
    by bin(timestamp, 1h)

Dynamic and JSON Parsing

// Parse JSON from string column
events
| extend parsed = parse_json(json_payload)
| extend device = tostring(parsed.device_id),
         temp = todouble(parsed.readings.temperature),
         tags = parsed.tags  // Array access
| mv-expand tag = tags       // Expand array to rows

Materialized Views in KQL

Pre-computed aggregations that auto-refresh when source data changes:

// Create materialized view: hourly averages (auto-updated!)
.create materialized-view HourlyAvgTemp on table sensor_readings
{
    sensor_readings
    | summarize avg_temp = avg(temperature), max_temp = max(temperature), count = count()
        by bin(timestamp, 1h), device_id
}

// Query the materialized view (instant — pre-computed!)
HourlyAvgTemp
| where timestamp > ago(7d)
| order by timestamp desc

// Check materialized view health
.show materialized-view HourlyAvgTemp extents

Materialized views vs regular queries: A regular aggregation query scans ALL raw data every time. A materialized view pre-computes and maintains the result incrementally — 100x faster for repeated queries.

Eventhouse Optimization

Retention Policies

// Set data retention: auto-delete after 90 days
.alter table sensor_readings policy retention "{'SoftDeletePeriod': '90.00:00:00'}"

// Hot cache: keep last 30 days in fast SSD cache
.alter table sensor_readings policy caching hot = 30d

Partitioning

// Partition by date for faster time-range queries
.alter table sensor_readings policy partitioning
@'{"PartitionKeys": [{"ColumnName": "timestamp", "Kind": "UniformRange", "Properties": {"Reference": "2020-01-01", "RangeSize": "1.00:00:00"}}]}'

Error Resolution in RTI

Error Cause Fix
Eventstream ingestion lag Source producing faster than destination consumes Scale up Eventhouse, add partitions to Event Hub
KQL query timeout Scanning too much data Add time filters, use materialized views
Eventhouse disk full No retention policy Set retention (90d), purge old data
Shortcut stale data Accelerated shortcut refresh failed Check source connectivity, restart acceleration
Schema mismatch Source schema changed Update table schema, restart ingestion

Common Mistakes

  1. Not setting retention policies — Eventhouse grows forever without them. Set 30-90 day retention.
  2. Using standard shortcuts when accelerated is needed — frequent queries on standard shortcuts are slow. Accelerate them.
  3. Not using materialized views — recomputing hourly aggregations from billions of rows is wasteful. Materialized views pre-compute once.
  4. Wrong window type — tumbling for non-overlapping counts, hopping for moving averages, session for user activity tracking. Match the window to the business question.

Interview Questions

Q: What are the five window types in Fabric Real-Time Analytics? A: Tumbling (non-overlapping fixed-size), Hopping/Sliding (overlapping fixed-size), Sliding (event-triggered fixed-duration), Session (dynamic based on activity gaps), Snapshot (point-in-time state). Each serves different analytical needs — tumbling for counts, hopping for moving averages, session for user activity.

Q: What is the difference between accelerated and standard shortcuts in KQL? A: Standard shortcuts read from OneLake at query time (slower, always current). Accelerated shortcuts copy data into Eventhouse cache (faster, near real-time refresh). Use accelerated for frequently queried data, standard for infrequent historical queries.

Wrapping Up

This post completes your Real-Time Intelligence knowledge. Combined with the RTI overview post, you now cover all DP-700 streaming objectives.

Related posts:Real-Time Intelligence OverviewSpark Structured Streaming


Naveen Vuppula is a Senior Data Engineering Consultant based in Ontario, Canada. He writes about data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link