Materialized Lake Views in Fabric: What, When, Why, Bronze-Silver-Gold with MLVs, Automatic Refresh, Data Quality Checks, and Limitations

Materialized Lake Views in Fabric: What, When, Why, Bronze-Silver-Gold with MLVs, Automatic Refresh, Data Quality Checks, and Limitations

Materialized Lake Views (MLVs) are one of Fabric’s newest and most powerful features. They are pre-computed query results stored as Delta tables that automatically refresh when source data changes. Think of them as views that CACHE their results — the query runs once, the result is stored, and subsequent reads use the cached result instead of recomputing.

Traditional views re-execute the query every time. Materialized Lake Views execute once and maintain the result. For complex transformations that many consumers query frequently, MLVs can reduce compute by 90%.

Think of a regular view like asking a chef to cook a fresh meal every time someone orders. An MLV is like preparing a buffet — cook once, serve many. The buffet auto-restocks when ingredients (source data) change.

Table of Contents

  • What Are Materialized Lake Views?
  • MLVs vs Regular Views vs Tables
  • When and Why to Use MLVs
  • Creating Materialized Lake Views
  • Basic MLV Creation
  • MLV with Aggregations
  • MLV with Joins
  • Automatic Refresh
  • How Refresh Works
  • Change Data Feed (CDF) for MLVs
  • Enabling CDF on Source Tables
  • Building Bronze, Silver, Gold with MLVs
  • Bronze to Silver MLV
  • Silver to Gold MLV
  • Data Quality Checks with MLVs
  • Optimal Refresh Strategies
  • Scheduling MLVs
  • Data Lineage with MLVs
  • Debugging MLVs
  • Data Quality Reports
  • MLV Limitations
  • Common Mistakes
  • Interview Questions
  • Wrapping Up

What Are Materialized Lake Views?

Regular View:
  Query → Reads source tables → Computes result → Returns result
  (computes EVERY time — expensive for complex queries)

Materialized Lake View:
  First query → Reads source tables → Computes result → STORES result as Delta table
  Second query → Reads stored result → Returns instantly
  Source changes → Auto-refreshes the stored result

MLVs vs Regular Views vs Tables

Feature Regular View Materialized Lake View Delta Table
Stores data No (virtual) Yes (cached Delta) Yes (full table)
Refresh On every query Automatic (on source change) Manual (notebook/pipeline)
Performance Slow (recomputes) Fast (reads cached result) Fast (reads data)
Maintenance Zero Low (auto-refresh) High (build pipelines)
Best for Simple, infrequent queries Complex, frequently queried transforms Full ETL control

When and Why to Use MLVs

Use MLVs when: – A complex query is run frequently by multiple consumers (analysts, Power BI) – The transformation is SQL-expressible (joins, aggregations, filters) – You want auto-refresh without building pipelines – Bronze-to-Silver or Silver-to-Gold transformations are straightforward

Do NOT use MLVs when: – You need PySpark/Python transformations (MLVs are SQL only) – You need SCD Type 2 logic (MLVs do not support MERGE) – The transformation is trivial (simple SELECT — regular view is fine) – You need full control over refresh timing and error handling

Creating Materialized Lake Views

Basic MLV Creation

%%sql
-- Create an MLV that cleans customer data (Bronze → Silver)
CREATE MATERIALIZED VIEW silver.customers_clean
AS
SELECT
    customer_id,
    TRIM(UPPER(first_name)) AS first_name,
    TRIM(UPPER(last_name)) AS last_name,
    LOWER(TRIM(email)) AS email,
    city,
    country,
    CURRENT_TIMESTAMP() AS processed_at
FROM bronze.raw_customers
WHERE customer_id IS NOT NULL;

MLV with Aggregations

-- Create an MLV for daily revenue summary (Silver → Gold)
CREATE MATERIALIZED VIEW gold.daily_revenue
AS
SELECT
    order_date,
    product_category,
    COUNT(DISTINCT customer_id) AS unique_customers,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
FROM silver.orders_clean o
JOIN silver.products_clean p ON o.product_id = p.product_id
GROUP BY order_date, product_category;

MLV with Joins

-- Customer 360 MLV joining multiple Silver tables
CREATE MATERIALIZED VIEW gold.customer_360
AS
SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    c.city,
    COUNT(o.order_id) AS total_orders,
    SUM(o.amount) AS lifetime_revenue,
    MAX(o.order_date) AS last_order_date,
    DATEDIFF(DAY, MAX(o.order_date), CURRENT_DATE()) AS days_since_last_order
FROM silver.customers_clean c
LEFT JOIN silver.orders_clean o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name, c.city;

Automatic Refresh

How Refresh Works

When source Delta tables change (INSERT, UPDATE, DELETE), the MLV detects the change and refreshes automatically:

1. Source table changes (new rows inserted by pipeline)
2. MLV detects change via Delta transaction log
3. MLV incrementally recomputes ONLY affected partitions
4. Updated result is available for queries
5. Consumers (Power BI, notebooks) see fresh data automatically

Change Data Feed (CDF) for MLVs

Enable CDF on source tables for more efficient incremental refresh:

-- Enable CDF on source table
ALTER TABLE bronze.raw_customers SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');
ALTER TABLE bronze.raw_orders SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');

-- MLV uses CDF to detect exactly which rows changed → faster refresh

Without CDF: MLV must compare full snapshots to detect changes (slower). With CDF: MLV reads only the change feed (INSERT, UPDATE, DELETE records) — much faster.

Building Bronze, Silver, Gold with MLVs

Bronze to Silver MLV

-- Instead of a notebook for simple cleaning, use an MLV:
CREATE MATERIALIZED VIEW silver.orders_clean
AS
SELECT
    order_id,
    customer_id,
    product_id,
    CAST(amount AS DECIMAL(10,2)) AS amount,
    CAST(order_date AS DATE) AS order_date,
    TRIM(status) AS status
FROM bronze.raw_orders
WHERE order_id IS NOT NULL
  AND amount > 0
  AND status != 'TEST';
-- Auto-refreshes when bronze.raw_orders gets new data from pipeline

Silver to Gold MLV

-- Instead of a Gold notebook for simple aggregation, use an MLV:
CREATE MATERIALIZED VIEW gold.product_performance
AS
SELECT
    p.product_name,
    p.category,
    COUNT(*) AS total_orders,
    SUM(o.amount) AS total_revenue,
    AVG(o.amount) AS avg_order_value,
    COUNT(DISTINCT o.customer_id) AS unique_buyers
FROM silver.orders_clean o
JOIN silver.products_clean p ON o.product_id = p.product_id
GROUP BY p.product_name, p.category;

Data Quality Checks with MLVs

-- MLV that reports data quality metrics (auto-updates!)
CREATE MATERIALIZED VIEW quality.data_quality_report
AS
SELECT
    'customers' AS table_name,
    COUNT(*) AS total_rows,
    SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) AS null_emails,
    SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS null_email_pct,
    COUNT(DISTINCT customer_id) AS distinct_ids,
    COUNT(*) - COUNT(DISTINCT customer_id) AS duplicate_ids
FROM silver.customers_clean
UNION ALL
SELECT
    'orders' AS table_name,
    COUNT(*) AS total_rows,
    SUM(CASE WHEN amount IS NULL THEN 1 ELSE 0 END) AS null_amounts,
    SUM(CASE WHEN amount IS NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS null_amount_pct,
    COUNT(DISTINCT order_id) AS distinct_ids,
    COUNT(*) - COUNT(DISTINCT order_id) AS duplicate_ids
FROM silver.orders_clean;

Power BI report on this MLV = auto-refreshing data quality dashboard.

Scheduling MLVs

MLVs refresh automatically, but you can control WHEN:

-- Set refresh schedule (if supported in your Fabric version)
ALTER MATERIALIZED VIEW gold.daily_revenue
SET TBLPROPERTIES ('refresh.schedule' = 'EVERY 1 HOUR');

Or trigger refresh manually:

-- Force immediate refresh
REFRESH MATERIALIZED VIEW gold.daily_revenue;

Debugging MLVs

-- Check MLV status
DESCRIBE EXTENDED gold.daily_revenue;

-- Check last refresh time
SHOW TBLPROPERTIES gold.daily_revenue;

-- If MLV shows stale data:
-- 1. Check source table has new data
-- 2. Check CDF is enabled on source
-- 3. Force refresh: REFRESH MATERIALIZED VIEW
-- 4. Check for schema mismatches between MLV and source

MLV Limitations

Limitation Detail
SQL only Cannot use PySpark, Python UDFs, or complex logic
No MERGE/SCD Cannot do upsert or SCD Type 2 — use notebooks for that
No cross-lakehouse Source and MLV must be in the same Lakehouse
Schema changes If source schema changes, MLV may need to be recreated
Complex joins Very complex multi-table joins may have slow refresh
Preview feature Some capabilities may change — check latest docs

Common Mistakes

  1. Using MLVs for everything — SCD Type 2, complex PySpark logic, and multi-lakehouse ETL still need notebooks. MLVs are for SQL-expressible transforms.
  2. Not enabling CDF on source tables — without CDF, refresh is slower (full comparison vs change feed).
  3. Creating MLVs on Bronze data — Bronze data is messy. Create MLVs on Silver (cleaned) data for Gold layer aggregations.
  4. Not monitoring refresh — a failed refresh means stale data. Check MLV properties and set up alerts.

Interview Questions

Q: What are Materialized Lake Views and how do they differ from regular views? A: Regular views re-execute the query every time (virtual, no storage). MLVs execute once, store the result as a Delta table, and auto-refresh when source data changes. MLVs are faster for repeated complex queries and require no pipeline maintenance for refresh.

Q: When would you use an MLV vs a notebook for transformation? A: MLV for SQL-expressible transforms (filter, join, aggregate) that are queried frequently and need auto-refresh. Notebook for complex logic (SCD Type 2, PySpark, Python libraries, custom functions, multi-lakehouse writes). MLVs replace simple notebooks; complex notebooks cannot be replaced.

Wrapping Up

MLVs are the low-maintenance alternative to notebooks for straightforward transformations. Auto-refresh eliminates pipeline scheduling. CDF enables efficient incremental processing. Use them for Bronze→Silver cleaning and Silver→Gold aggregation when the logic is SQL-expressible.

Related posts:Fabric Lakehouse GuideDelta Lake Deep DiveMedallion Architecture


Naveen Vuppula is a Senior Data Engineering Consultant based in Ontario, Canada.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link