Power BI in Fabric: Direct Lake, Semantic Models, Import vs DirectQuery vs Direct Lake, and Connecting Your Data to Reports

Every pipeline, every notebook, every transformation — all of it leads to one thing: a dashboard that business users look at. Power BI is that dashboard. And in Fabric, Power BI gets a superpower it never had before: Direct Lake mode.

Direct Lake reads data directly from OneLake Delta tables — no data import, no scheduled refresh, no DirectQuery performance penalty. It combines the speed of Import mode with the freshness of DirectQuery. This post explains how it works, when it falls back to DirectQuery, and how to connect your Lakehouse and Warehouse tables to Power BI reports.

Think of Import mode like printing a report — fast to read but stale the moment you print it. DirectQuery is like a live video feed — always current but slow because it queries the source every time you interact. Direct Lake is like a smart TV with local cache — it reads directly from storage (fast) and the data is always the latest version (fresh). Best of both worlds.

The Three Connection Modes
Import Mode (Traditional)
DirectQuery Mode
Direct Lake Mode (New in Fabric)
How Direct Lake Works
When Direct Lake Falls Back to DirectQuery
Semantic Models in Fabric
What Is a Semantic Model
Auto-Generated vs Custom Semantic Models
Creating a Semantic Model from Lakehouse
Creating a Semantic Model from Warehouse
Building Relationships
Measures and Calculated Columns
Connecting Power BI Desktop to Fabric
Default Semantic Model vs Custom
Report Refresh Behavior
Row-Level Security with Direct Lake
Performance Optimization for Direct Lake
V-Order
Column Data Types
Partitioning
Real-World Scenario: End-to-End Pipeline to Dashboard
Common Mistakes
Interview Questions
Wrapping Up

The Three Connection Modes

Feature	Import	DirectQuery	Direct Lake
Where data lives	Copied into Power BI dataset	Stays in source DB	Stays in OneLake (Delta)
Freshness	Stale (until refresh)	Always current	Always current
Speed	Fastest (in-memory)	Slower (queries source on every interaction)	Fast (reads Delta directly)
Refresh needed?	Yes (scheduled)	No	No
Dataset size limit	1 GB (Pro), 400 GB (Premium)	No limit	No limit
Works with	Any data source	Any data source	Fabric Lakehouse/Warehouse only
Available in	Always	Always	Fabric only

Import Mode (Traditional)

Data Source → [Scheduled Refresh] → Power BI Dataset (in-memory copy) → Report
  - Data is COPIED into Power BI
  - Fast queries (everything in RAM)
  - Stale between refreshes (8 refreshes/day on Pro)
  - Size limited (1 GB Pro, 400 GB Premium)

DirectQuery Mode

Report interaction → Power BI sends SQL query → Data Source → results → Report
  - No data copied (always queries the source)
  - Always fresh (real-time)
  - Slower (every click sends a query to the source)
  - Source must be performant

Direct Lake Mode (New in Fabric)

Report interaction → Power BI reads Delta files directly from OneLake → Report
  - No data copied (reads Delta Parquet files directly)
  - Always fresh (reads latest Delta version)
  - Fast (reads optimized columnar files, similar speed to Import)
  - No scheduled refresh needed
  - ONLY works with Fabric Lakehouse and Warehouse tables

How Direct Lake Works

Step 1: You create a semantic model on a Lakehouse or Warehouse
Step 2: Power BI identifies the Delta table files in OneLake
Step 3: When a user opens a report, Power BI reads the Parquet files directly
        (no data copy, no SQL query to the source)
Step 4: Data is loaded into the VertiPaq engine (in-memory, columnar)
Step 5: Report renders with Import-like speed

When the underlying Delta table changes (notebook writes new data):
  → Power BI detects the change on next interaction
  → Loads the updated Delta files
  → Report shows fresh data — no scheduled refresh!

The key insight: Direct Lake bypasses the SQL query layer entirely. Instead of sending SELECT queries to a database (DirectQuery), it reads the raw Parquet files that make up the Delta table. This is why it is as fast as Import but as fresh as DirectQuery.

When Direct Lake Falls Back to DirectQuery

Direct Lake has guardrails. If the data exceeds certain thresholds, it falls back to DirectQuery mode (slower but still works):

Guardrail	F2 Capacity	F64 Capacity	F128+ Capacity
Max rows per table	300 million	1.5 billion	3 billion
Max tables per model	500	500	500
Max model size (on disk)	10 GB	40 GB	80 GB
Max columns per table	2,000	2,000	2,000

If ANY threshold is exceeded → automatic fallback to DirectQuery for that table.

How to prevent fallback: – Use OPTIMIZE on Delta tables (fewer, larger files = faster loading) – Choose appropriate data types (INT instead of BIGINT, avoid unnecessary VARCHARs) – Aggregate at the Gold layer (do not point Power BI at raw Bronze tables) – V-Order optimize your Delta tables

Semantic Models in Fabric

What Is a Semantic Model

A Semantic Model (formerly “dataset”) is the analytical layer between your data and Power BI reports. It defines which tables are available, their relationships, measures (DAX calculations), and security rules.

Lakehouse/Warehouse Tables → Semantic Model → Power BI Report(s)
  (raw data)                   (relationships,    (visuals, charts,
                                measures, RLS)     dashboards)

Auto-Generated vs Custom Semantic Models

Type	Created	Best For
Auto-generated	Automatically when you create a Lakehouse or Warehouse	Quick exploration, prototyping
Custom	Manually by you	Production reports (controlled relationships, measures, RLS)

Creating a Semantic Model from Lakehouse

Open your Lakehouse → click the SQL analytics endpoint view
Click New semantic model (or Reporting → New semantic model)
Select tables to include (e.g., dim_customer, dim_product, fact_sales)
Click Confirm
The semantic model opens in the web-based modeling view

Creating a Semantic Model from Warehouse

Open your Warehouse
Click New semantic model
Select tables/views from your schemas (gold.dim_customer, reports.vw_revenue)
Click Confirm

Building Relationships

In the modeling view: 1. Drag a column from one table to another (e.g., fact_sales.customer_key → dim_customer.customer_key) 2. Configure: Cardinality (Many-to-One), Cross filter direction (Single) 3. Repeat for all dimension-to-fact relationships

dim_customer ──── customer_key ───── fact_sales
dim_product  ──── product_key  ───── fact_sales
dim_date     ──── date_key     ───── fact_sales

Measures and Calculated Columns

// DAX Measures (calculated at query time — recommended)
Total Revenue = SUM(fact_sales[total_amount])
Order Count = COUNTROWS(fact_sales)
Avg Order Value = DIVIDE([Total Revenue], [Order Count], 0)
YoY Growth = 
    VAR CurrentYear = [Total Revenue]
    VAR PrevYear = CALCULATE([Total Revenue], SAMEPERIODLASTYEAR(dim_date[full_date]))
    RETURN DIVIDE(CurrentYear - PrevYear, PrevYear, 0)

Connecting Power BI Desktop to Fabric

Open Power BI Desktop
Get Data → Microsoft Fabric → Lakehouse or Warehouse
Sign in with your organizational account
Select workspace → select Lakehouse/Warehouse
Choose tables → Load
Connection mode is automatically Direct Lake (if using Fabric)

Default Semantic Model vs Custom

Every Lakehouse and Warehouse auto-generates a default semantic model. Here is when to use it vs building your own:

Feature	Auto-Generated (Default)	Custom (You Build)
Created	Automatically with Lakehouse/Warehouse	Manually by you
Tables included	All tables in the item	Only tables you select
Relationships	Auto-detected (may be wrong)	You define (always correct)
Measures (DAX)	None	Custom measures you create
RLS	Not configured	You configure
Endorsement	None	Can be Certified or Promoted
Best for	Quick exploration, prototyping	Production reports

Recommendation: Disable the auto-generated default semantic model on Bronze and Silver lakehouses — analysts should not build reports on raw data. To disable: open the Lakehouse → Settings → Default semantic model → toggle Off. Create custom semantic models only on Gold layer items with proper star schema relationships, DAX measures, and RLS configured.

Why auto-detected relationships are dangerous: The default model guesses relationships based on column names. If both dim_customer and dim_product have a column called id, Power BI might incorrectly create a relationship between them. In a custom model, YOU define the correct relationships: fact_sales.customer_key → dim_customer.customer_key.

Report Refresh Behavior

Mode	When Data Updates	User Experience
Import	Only after scheduled refresh	Stale between refreshes
DirectQuery	Every report interaction	Always fresh, but slower
Direct Lake	When underlying Delta files change	Fresh within seconds of notebook/pipeline completion — no refresh needed

The magic: Run your notebook at 6 AM → it writes to the Lakehouse → Power BI report at 6:01 AM shows the new data. No refresh button. No schedule. It just works.

Row-Level Security with Direct Lake

RLS works the same as traditional Power BI but is enforced through the SQL analytics endpoint:

-- In the Lakehouse SQL analytics endpoint or Warehouse:
-- Create a security function
CREATE FUNCTION dbo.fn_rls_filter(@region VARCHAR(50))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS result WHERE @region = USER_NAME();

-- Create security policy
CREATE SECURITY POLICY RegionFilter
ADD FILTER PREDICATE dbo.fn_rls_filter(region) ON dbo.fact_sales
WITH (STATE = ON);

Or define RLS roles in the Semantic Model using DAX:

// Role: EastRegion
// Table: fact_sales
// Filter: [region] = "East"
[region] = USERPRINCIPALNAME()

Performance Optimization for Direct Lake

V-Order

V-Order is a write-time optimization that sorts data within Parquet files for faster Direct Lake reads:

# V-Order is enabled by default in Fabric notebooks
# Verify:
spark.conf.get("spark.sql.parquet.vorder.enabled")  # "true"

# If disabled, enable:
spark.conf.set("spark.sql.parquet.vorder.enabled", "true")

Column Data Types

Direct Lake reads Parquet files directly into the VertiPaq in-memory engine. Data types directly affect memory consumption and query speed:

Instead Of	Use	Memory Savings	Why
BIGINT (8 bytes)	INT (4 bytes)	50%	Most keys and counts fit in INT (up to 2.1 billion)
DATETIME2 (8 bytes)	DATE (3 bytes)	62%	Date dimensions rarely need time precision
VARCHAR(MAX)	VARCHAR(50)	Significant	VertiPaq stores distinct values — shorter = less memory
FLOAT (8 bytes)	DECIMAL(10,2)	Comparable	Precise for money; FLOAT has rounding issues
NVARCHAR	VARCHAR	50%	Use NVARCHAR only if you need Unicode characters

Impact on Direct Lake: Narrower types = smaller Parquet files = faster load into VertiPaq = less chance of hitting guardrail fallback to DirectQuery. For a 500M-row fact table, switching BIGINT keys to INT can reduce model size by 30%, keeping you within Direct Lake guardrails.

In the semantic model, verify column types: open the model → click a table → check the Data type column in the properties pane. Override any incorrect auto-detected types (e.g., a ZIP code detected as a number should be set to Text).

Partitioning

If your fact table is partitioned by date in the Lakehouse, Direct Lake benefits from partition pruning — when a report filter is applied (e.g., “Last 30 days”), Fabric skips reading Parquet files from older partitions entirely:

# Partition fact table by year/month for efficient Direct Lake reads
df.write.format("delta") \
    .partitionBy("year", "month") \
    .mode("overwrite") \
    .saveAsTable("gold.fact_sales")

# Then run OPTIMIZE + Z-ORDER on frequently filtered dimension keys
spark.sql("OPTIMIZE gold.fact_sales ZORDER BY (customer_key, product_key)")

How partitioning helps Direct Lake:

Without partitioning:
  Report filter: "June 2026" → Direct Lake reads ALL 500 Parquet files → slow

With partitioning by year/month:
  Report filter: "June 2026" → Direct Lake reads ONLY files in /year=2026/month=6/ → fast
  (skips 490 files, reads only 10)

Combine with Z-ORDER: Partitioning handles date-range filters. Z-ORDER handles dimension key filters (customer, product). Together they minimize the files Direct Lake needs to read for any report query.

Best Practices

Run OPTIMIZE on Gold tables — compacts small files, improves Direct Lake load time
Use appropriate types — INT instead of BIGINT, DATE instead of DATETIME when time is not needed
Aggregate at Gold layer — Direct Lake on a 100M-row fact table is fast; on a 10B-row Bronze table, it falls back
Limit columns — include only columns needed for reporting in the semantic model
Use star schema — fact + dimension tables with proper relationships load faster than wide denormalized tables

Real-World Scenario: End-to-End Pipeline to Dashboard

6:00 AM — Pipeline runs:
  Copy Activity → Azure SQL to bronze_lakehouse (raw data)

6:05 AM — Notebook runs:
  NB_Bronze_to_Silver → cleans, validates, deduplicates

6:15 AM — Notebook runs:
  NB_Silver_to_Gold → Delta MERGE into dim/fact tables in gold_lakehouse
  NB runs OPTIMIZE on all Gold tables

6:20 AM — Semantic Model:
  Points to gold_lakehouse SQL analytics endpoint
  Tables: dim_customer, dim_product, dim_date, fact_sales
  Mode: Direct Lake (automatic)

6:21 AM — Business user opens Power BI report:
  Report reads Delta files from gold_lakehouse via Direct Lake
  Data is from TODAY's 6 AM load — completely fresh
  No scheduled refresh was triggered
  Report loads in 2-3 seconds (Import-like speed)

Common Mistakes

Pointing Direct Lake at Bronze tables — raw, unoptimized tables with billions of rows cause fallback to DirectQuery. Always point to optimized Gold layer tables.
Not running OPTIMIZE — fragmented Delta tables with thousands of small files slow down Direct Lake. OPTIMIZE weekly or after large loads.
Using Import mode in Fabric — there is almost no reason to use Import when Direct Lake is available. Direct Lake gives Import speed with DirectQuery freshness.
Not building a proper star schema — wide denormalized tables are harder for Direct Lake to optimize. Star schema (fact + dimensions) loads faster and creates better DAX performance.
Ignoring fallback indicators — check the semantic model properties for “Direct Lake fallback” warnings. If falling back, optimize the underlying data.

Interview Questions

Q: What is Direct Lake mode in Power BI? A: Direct Lake reads Delta Parquet files directly from OneLake without copying data (Import) or querying a database (DirectQuery). It provides Import-like speed with DirectQuery freshness. Data is always current — when a notebook writes new data, the next report interaction sees it with no scheduled refresh. Only available with Fabric Lakehouse and Warehouse.

Q: When does Direct Lake fall back to DirectQuery? A: When guardrails are exceeded: rows per table (300M-3B depending on capacity), model size on disk (10-80 GB), or columns per table (2,000). Also falls back for features not supported in Direct Lake mode. Prevent fallback by running OPTIMIZE, using proper data types, and aggregating at the Gold layer.

Q: What is the difference between a Semantic Model and a Power BI Dataset? A: They are the same thing — Microsoft renamed “Dataset” to “Semantic Model.” It is the analytical layer that defines tables, relationships, measures (DAX), and security rules between the data source and Power BI reports.

Wrapping Up

Direct Lake is the reason Fabric is transformative for Power BI. No more choosing between speed (Import) and freshness (DirectQuery). Build your pipelines, write to your Gold Lakehouse or Warehouse, and Power BI reads the latest data directly — fast, fresh, and maintenance-free.

← Previous: Mirrored Databases Fabric (25/38) Next: Real-Time Intelligence →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Power BI in Fabric: Direct Lake, Semantic Models, Import vs DirectQuery vs Direct Lake, and Connecting Your Data to Reports

Table of Contents

The Three Connection Modes

Import Mode (Traditional)

DirectQuery Mode

Direct Lake Mode (New in Fabric)

How Direct Lake Works

When Direct Lake Falls Back to DirectQuery

Semantic Models in Fabric

What Is a Semantic Model

Auto-Generated vs Custom Semantic Models

Creating a Semantic Model from Lakehouse

Creating a Semantic Model from Warehouse

Building Relationships

Measures and Calculated Columns

Connecting Power BI Desktop to Fabric

Default Semantic Model vs Custom

Report Refresh Behavior

Row-Level Security with Direct Lake

Performance Optimization for Direct Lake

V-Order

Column Data Types

Partitioning

Best Practices

Real-World Scenario: End-to-End Pipeline to Dashboard

Common Mistakes

Interview Questions

Wrapping Up

Leave a Comment Cancel Reply

Table of Contents

The Three Connection Modes

Import Mode (Traditional)

DirectQuery Mode

Direct Lake Mode (New in Fabric)

How Direct Lake Works

When Direct Lake Falls Back to DirectQuery

Semantic Models in Fabric

What Is a Semantic Model

Auto-Generated vs Custom Semantic Models

Creating a Semantic Model from Lakehouse

Creating a Semantic Model from Warehouse

Building Relationships

Measures and Calculated Columns

Connecting Power BI Desktop to Fabric

Default Semantic Model vs Custom

Report Refresh Behavior

Row-Level Security with Direct Lake

Performance Optimization for Direct Lake

V-Order

Column Data Types

Partitioning

Best Practices

Real-World Scenario: End-to-End Pipeline to Dashboard

Common Mistakes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply