Materialized Lake Views in Fabric: What, When, Why, Bronze-Silver-Gold with MLVs, Automatic Refresh, Data Quality Checks, and Limitations

Materialized Lake Views (MLVs) are one of Fabric’s newest and most powerful features. They are pre-computed query results stored as Delta tables that automatically refresh when source data changes. Think of them as views that CACHE their results — the query runs once, the result is stored, and subsequent reads use the cached result instead of recomputing.

Traditional views re-execute the query every time. Materialized Lake Views execute once and maintain the result. For complex transformations that many consumers query frequently, MLVs can reduce compute by 90%.

Think of a regular view like asking a chef to cook a fresh meal every time someone orders. An MLV is like preparing a buffet — cook once, serve many. The buffet auto-restocks when ingredients (source data) change.

What Are Materialized Lake Views?
MLVs vs Regular Views vs Tables
When and Why to Use MLVs
Creating Materialized Lake Views
Basic MLV Creation
MLV with Aggregations
MLV with Joins
Automatic Refresh
How Refresh Works
Change Data Feed (CDF) for MLVs
Enabling CDF on Source Tables
Building Bronze, Silver, Gold with MLVs
Bronze to Silver MLV
Silver to Gold MLV
Data Quality Checks with MLVs
Optimal Refresh Strategies
Scheduling MLVs
Data Lineage with MLVs
Debugging MLVs
Data Quality Reports
MLV Limitations
Common Mistakes
Interview Questions
Wrapping Up

What Are Materialized Lake Views?

Regular View:
  Query → Reads source tables → Computes result → Returns result
  (computes EVERY time — expensive for complex queries)

Materialized Lake View:
  First query → Reads source tables → Computes result → STORES result as Delta table
  Second query → Reads stored result → Returns instantly
  Source changes → Auto-refreshes the stored result

MLVs vs Regular Views vs Tables

Feature	Regular View	Materialized Lake View	Delta Table
Stores data	No (virtual)	Yes (cached Delta)	Yes (full table)
Refresh	On every query	Automatic (on source change)	Manual (notebook/pipeline)
Performance	Slow (recomputes)	Fast (reads cached result)	Fast (reads data)
Maintenance	Zero	Low (auto-refresh)	High (build pipelines)
Best for	Simple, infrequent queries	Complex, frequently queried transforms	Full ETL control

When and Why to Use MLVs

Use MLVs when: – A complex query is run frequently by multiple consumers (analysts, Power BI) – The transformation is SQL-expressible (joins, aggregations, filters) – You want auto-refresh without building pipelines – Bronze-to-Silver or Silver-to-Gold transformations are straightforward

Do NOT use MLVs when: – You need PySpark/Python transformations (MLVs are SQL only) – You need SCD Type 2 logic (MLVs do not support MERGE) – The transformation is trivial (simple SELECT — regular view is fine) – You need full control over refresh timing and error handling

Creating Materialized Lake Views

Basic MLV Creation

%%sql
-- Create an MLV that cleans customer data (Bronze → Silver)
CREATE MATERIALIZED VIEW silver.customers_clean
AS
SELECT
    customer_id,
    TRIM(UPPER(first_name)) AS first_name,
    TRIM(UPPER(last_name)) AS last_name,
    LOWER(TRIM(email)) AS email,
    city,
    country,
    CURRENT_TIMESTAMP() AS processed_at
FROM bronze.raw_customers
WHERE customer_id IS NOT NULL;

MLV with Aggregations

-- Create an MLV for daily revenue summary (Silver → Gold)
CREATE MATERIALIZED VIEW gold.daily_revenue
AS
SELECT
    order_date,
    product_category,
    COUNT(DISTINCT customer_id) AS unique_customers,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
FROM silver.orders_clean o
JOIN silver.products_clean p ON o.product_id = p.product_id
GROUP BY order_date, product_category;

MLV with Joins

-- Customer 360 MLV joining multiple Silver tables
CREATE MATERIALIZED VIEW gold.customer_360
AS
SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    c.city,
    COUNT(o.order_id) AS total_orders,
    SUM(o.amount) AS lifetime_revenue,
    MAX(o.order_date) AS last_order_date,
    DATEDIFF(DAY, MAX(o.order_date), CURRENT_DATE()) AS days_since_last_order
FROM silver.customers_clean c
LEFT JOIN silver.orders_clean o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name, c.city;

Automatic Refresh

How Refresh Works

When source Delta tables change (INSERT, UPDATE, DELETE), the MLV detects the change and refreshes automatically:

1. Source table changes (new rows inserted by pipeline)
2. MLV detects change via Delta transaction log
3. MLV incrementally recomputes ONLY affected partitions
4. Updated result is available for queries
5. Consumers (Power BI, notebooks) see fresh data automatically

Change Data Feed (CDF) for MLVs

Enable CDF on source tables for more efficient incremental refresh:

-- Enable CDF on source table
ALTER TABLE bronze.raw_customers SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');
ALTER TABLE bronze.raw_orders SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');

-- MLV uses CDF to detect exactly which rows changed → faster refresh

Without CDF: MLV must compare full snapshots to detect changes (slower). With CDF: MLV reads only the change feed (INSERT, UPDATE, DELETE records) — much faster.

Enabling CDF on Source Tables

Enable CDF on ALL tables that serve as sources for your MLVs. CDF must be enabled BEFORE creating the MLV — it does not capture changes retroactively:

-- Step 1: Enable CDF on every source table FIRST
ALTER TABLE bronze.raw_customers SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');
ALTER TABLE bronze.raw_orders SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');
ALTER TABLE silver.products_clean SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');

-- Step 2: THEN create your MLVs
CREATE MATERIALIZED VIEW gold.daily_revenue AS
SELECT ... FROM silver.orders_clean o JOIN silver.products_clean p ...;

-- If you forgot to enable CDF before creating the MLV:
-- 1. Enable CDF on source: ALTER TABLE ... SET TBLPROPERTIES (...)
-- 2. Drop the MLV: DROP MATERIALIZED VIEW gold.daily_revenue
-- 3. Recreate the MLV: CREATE MATERIALIZED VIEW gold.daily_revenue AS ...
-- The recreated MLV will use CDF for all future refreshes

-- Verify CDF is enabled:
SHOW TBLPROPERTIES bronze.raw_customers ('delta.enableChangeDataFeed');
-- Should return: true

Building Bronze, Silver, Gold with MLVs

Bronze to Silver MLV

-- Instead of a notebook for simple cleaning, use an MLV:
CREATE MATERIALIZED VIEW silver.orders_clean
AS
SELECT
    order_id,
    customer_id,
    product_id,
    CAST(amount AS DECIMAL(10,2)) AS amount,
    CAST(order_date AS DATE) AS order_date,
    TRIM(status) AS status
FROM bronze.raw_orders
WHERE order_id IS NOT NULL
  AND amount > 0
  AND status != 'TEST';
-- Auto-refreshes when bronze.raw_orders gets new data from pipeline

Silver to Gold MLV

-- Instead of a Gold notebook for simple aggregation, use an MLV:
CREATE MATERIALIZED VIEW gold.product_performance
AS
SELECT
    p.product_name,
    p.category,
    COUNT(*) AS total_orders,
    SUM(o.amount) AS total_revenue,
    AVG(o.amount) AS avg_order_value,
    COUNT(DISTINCT o.customer_id) AS unique_buyers
FROM silver.orders_clean o
JOIN silver.products_clean p ON o.product_id = p.product_id
GROUP BY p.product_name, p.category;

Data Quality Checks with MLVs

-- MLV that reports data quality metrics (auto-updates!)
CREATE MATERIALIZED VIEW quality.data_quality_report
AS
SELECT
    'customers' AS table_name,
    COUNT(*) AS total_rows,
    SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) AS null_emails,
    SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS null_email_pct,
    COUNT(DISTINCT customer_id) AS distinct_ids,
    COUNT(*) - COUNT(DISTINCT customer_id) AS duplicate_ids
FROM silver.customers_clean
UNION ALL
SELECT
    'orders' AS table_name,
    COUNT(*) AS total_rows,
    SUM(CASE WHEN amount IS NULL THEN 1 ELSE 0 END) AS null_amounts,
    SUM(CASE WHEN amount IS NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS null_amount_pct,
    COUNT(DISTINCT order_id) AS distinct_ids,
    COUNT(*) - COUNT(DISTINCT order_id) AS duplicate_ids
FROM silver.orders_clean;

Power BI report on this MLV = auto-refreshing data quality dashboard.

Optimal Refresh Strategies

MLV refresh frequency depends on the use case and cost tolerance:

Scenario	Strategy	How
Near real-time dashboards	Automatic refresh	Let MLV auto-refresh when source changes (default behavior)
Daily batch pipelines	Manual refresh after pipeline	`REFRESH MATERIALIZED VIEW gold.daily_revenue;` as last step in pipeline
Cost-sensitive large MLVs	Scheduled refresh	Refresh during off-peak hours to avoid competing with other workloads for CUs
Multiple dependent MLVs	Cascading refresh	Refresh Bronze→Silver MLV first, then Silver→Gold MLV (order matters)

Cost tip: Every refresh consumes CUs. For large Gold MLVs that aggregate billions of rows, refreshing every 5 minutes is expensive. If the dashboard only needs hourly freshness, schedule refresh hourly instead of relying on automatic refresh for every small source change.

Scheduling MLVs

MLVs refresh automatically, but you can control WHEN:

-- Set refresh schedule (if supported in your Fabric version)
ALTER MATERIALIZED VIEW gold.daily_revenue
SET TBLPROPERTIES ('refresh.schedule' = 'EVERY 1 HOUR');

Or trigger refresh manually:

-- Force immediate refresh
REFRESH MATERIALIZED VIEW gold.daily_revenue;

Data Lineage with MLVs

MLVs appear in Microsoft Purview data lineage like any other Fabric item. The lineage view shows the complete data flow:

Lineage view:
  Pipeline (ingest) → bronze.raw_customers → MLV: silver.customers_clean → MLV: gold.customer_360 → Semantic Model → Power BI Report

This makes it clear:
  ✅ Which source tables feed which MLVs
  ✅ Which MLVs feed which reports
  ✅ If a source table schema changes, which MLVs and reports are affected
  ✅ If an MLV fails to refresh, which downstream consumers see stale data

Access lineage from any item: right-click → View lineage (or open the Lineage view from the workspace). MLVs show as distinct items in the lineage graph with arrows showing source → MLV → consumer relationships.

Debugging MLVs

-- Check MLV status
DESCRIBE EXTENDED gold.daily_revenue;

-- Check last refresh time
SHOW TBLPROPERTIES gold.daily_revenue;

-- If MLV shows stale data:
-- 1. Check source table has new data
-- 2. Check CDF is enabled on source
-- 3. Force refresh: REFRESH MATERIALIZED VIEW
-- 4. Check for schema mismatches between MLV and source

Data Quality Reports

The data quality MLV from the section above becomes a powerful self-refreshing monitoring tool when connected to Power BI:

Setup:
  1. Create quality.data_quality_report MLV (SQL above)
  2. Build Power BI report on this MLV (Direct Lake)
  3. Add visuals: null % gauge, duplicate count card, row count trend
  4. Pin to a Teams channel or shared dashboard

Result:
  Pipeline loads new data → Source tables change → Quality MLV auto-refreshes →
  Power BI dashboard updates → Team sees quality metrics in real-time

  No scheduled quality notebooks. No separate quality pipeline.
  The MLV IS the quality pipeline — it auto-runs whenever data changes.

Advanced pattern: Combine the quality MLV with Data Activator to trigger alerts when quality drops below a threshold — for example, send a Teams notification when null_email_pct exceeds 5% or when duplicate_ids is greater than 0.

MLV Limitations

Limitation	Detail
SQL only	Cannot use PySpark, Python UDFs, or complex logic
No MERGE/SCD	Cannot do upsert or SCD Type 2 — use notebooks for that
No cross-lakehouse	Source and MLV must be in the same Lakehouse
Schema changes	If source schema changes, MLV may need to be recreated
Complex joins	Very complex multi-table joins may have slow refresh
Preview feature	Some capabilities may change — check latest docs

Common Mistakes

Using MLVs for everything — SCD Type 2, complex PySpark logic, and multi-lakehouse ETL still need notebooks. MLVs are for SQL-expressible transforms.
Not enabling CDF on source tables — without CDF, refresh is slower (full comparison vs change feed).
Creating MLVs on Bronze data — Bronze data is messy. Create MLVs on Silver (cleaned) data for Gold layer aggregations.
Not monitoring refresh — a failed refresh means stale data. Check MLV properties and set up alerts.

Interview Questions

Q: What are Materialized Lake Views and how do they differ from regular views? A: Regular views re-execute the query every time (virtual, no storage). MLVs execute once, store the result as a Delta table, and auto-refresh when source data changes. MLVs are faster for repeated complex queries and require no pipeline maintenance for refresh.

Q: When would you use an MLV vs a notebook for transformation? A: MLV for SQL-expressible transforms (filter, join, aggregate) that are queried frequently and need auto-refresh. Notebook for complex logic (SCD Type 2, PySpark, Python libraries, custom functions, multi-lakehouse writes). MLVs replace simple notebooks; complex notebooks cannot be replaced.

Wrapping Up

MLVs are the low-maintenance alternative to notebooks for straightforward transformations. Auto-refresh eliminates pipeline scheduling. CDF enables efficient incremental processing. Use them for Bronze→Silver cleaning and Silver→Gold aggregation when the logic is SQL-expressible.

← Previous: Delta Lake Table Properties Fabric (23/38) Next: Mirrored Databases →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Materialized Lake Views in Fabric: What, When, Why, Bronze-Silver-Gold with MLVs, Automatic Refresh, Data Quality Checks, and Limitations

Table of Contents

What Are Materialized Lake Views?

MLVs vs Regular Views vs Tables

When and Why to Use MLVs

Creating Materialized Lake Views

Basic MLV Creation

MLV with Aggregations

MLV with Joins

Automatic Refresh

How Refresh Works

Change Data Feed (CDF) for MLVs

Enabling CDF on Source Tables

Building Bronze, Silver, Gold with MLVs

Bronze to Silver MLV

Silver to Gold MLV

Data Quality Checks with MLVs

Optimal Refresh Strategies

Scheduling MLVs

Data Lineage with MLVs

Debugging MLVs

Data Quality Reports

MLV Limitations

Common Mistakes

Interview Questions

Wrapping Up

Leave a Comment Cancel Reply

Table of Contents

What Are Materialized Lake Views?

MLVs vs Regular Views vs Tables

When and Why to Use MLVs

Creating Materialized Lake Views

Basic MLV Creation

MLV with Aggregations

MLV with Joins

Automatic Refresh

How Refresh Works

Change Data Feed (CDF) for MLVs

Enabling CDF on Source Tables

Building Bronze, Silver, Gold with MLVs

Bronze to Silver MLV

Silver to Gold MLV

Data Quality Checks with MLVs

Optimal Refresh Strategies

Scheduling MLVs

Data Lineage with MLVs

Debugging MLVs

Data Quality Reports

MLV Limitations

Common Mistakes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply