Fabric Warehouse Advanced: COPY INTO, CTAS, Dynamic Management Views, Query Insights, Visual Query Editor, SSMS Connectivity, and T-SQL with Notebooks

Our Warehouse Practical Guide covered the fundamentals — creating tables, MERGE, stored procedures, views, and security. This post covers the ADVANCED capabilities: bulk loading with COPY INTO, creating tables from queries with CTAS, monitoring query performance with DMVs and Query Insights, the no-code Visual Query editor, connecting from SSMS, and the powerful pattern of combining T-SQL with Spark notebooks.

Think of the Warehouse Practical Guide as learning to drive a car — steering, braking, parking. This post is about what is under the hood — the engine diagnostics (DMVs, Query Insights), the turbo mode (COPY INTO, CTAS), the performance tuning (statistics, caching), and the hybrid engine (T-SQL + Spark notebooks). You do not need to know these to drive, but you need them to drive FAST and diagnose problems when the engine light comes on.

COPY INTO: Bulk Data Loading
CTAS: Create Table As Select
Dynamic Management Views (DMVs)
Query Insight Views
Visual Query Editor (No-Code)
Connecting from SSMS and Azure Data Studio
Integrating T-SQL with Spark Notebooks
Warehouse-Specific Optimization
Statistics and Query Plans
Table Distribution and Partitioning
Result Set Caching
Common Mistakes
Interview Questions
Wrapping Up

COPY INTO: Bulk Data Loading

When you need to load thousands or millions of rows from files into a Warehouse table, INSERT statements are painfully slow — they process one row at a time. COPY INTO is the bulk loading command designed for exactly this scenario. It reads entire files (CSV, Parquet) from OneLake or external storage and loads them into a table in a single, optimized operation.

Real-life analogy: INSERT INTO row-by-row is like carrying groceries from the car to the kitchen one item at a time — 50 trips for 50 items. COPY INTO is like loading everything into two big bags and carrying them all in one trip. Same groceries, dramatically less effort.

-- Load CSV from Lakehouse Files into Warehouse table
COPY INTO dbo.stg_customers
FROM 'https://onelake.dfs.fabric.microsoft.com/workspace/lakehouse.Lakehouse/Files/raw_csv/customers.csv'
WITH (
    FILE_TYPE = 'CSV',
    FIRSTROW = 2,              -- Skip header row
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '
',
    FIELDQUOTE = '"'
);

-- Load Parquet (even simpler — no format options needed)
COPY INTO dbo.stg_orders
FROM 'https://onelake.dfs.fabric.microsoft.com/workspace/lakehouse.Lakehouse/Files/parquet/orders/'
WITH (FILE_TYPE = 'PARQUET');

-- Load with wildcard (all files in folder)
COPY INTO dbo.stg_events
FROM 'https://onelake.dfs.fabric.microsoft.com/workspace/lakehouse.Lakehouse/Files/events/*.parquet'
WITH (FILE_TYPE = 'PARQUET');

Feature	COPY INTO	Pipeline Copy Activity	INSERT INTO (row-by-row)
Speed	Fast (bulk optimized)	Fast (parallel engine)	Slow (one row at a time)
Where it runs	Inside Warehouse (T-SQL)	In a Fabric Pipeline	Inside Warehouse (T-SQL)
Orchestration	None (standalone command)	Full (error handling, retry, sequencing)	None
Best for	Bulk loads inside stored procedures	Pipeline-orchestrated ETL	Small inserts (<100 rows)
File formats	CSV, Parquet	CSV, Parquet, JSON, ORC, Avro	N/A (data already in SQL)

When to use which: Use COPY INTO for fast, simple loads inside stored procedures or ad-hoc bulk loads. Use Pipeline Copy Activity when you need orchestration, error handling, logging, and monitoring. Never use INSERT INTO row-by-row for large datasets.

CTAS: Create Table As Select

CTAS (Create Table As Select) creates a new table from the results of a query. Instead of first creating an empty table and then inserting data, CTAS does both in one step. This is useful for materialized summaries, point-in-time snapshots, and pre-computed aggregation tables that dashboards can query instantly.

Real-life analogy: CTAS is like taking a photo of a whiteboard (snapshot). The whiteboard (source table) keeps changing as people add and erase things. But your photo (CTAS table) captures the exact state at that moment. You can refer to the photo anytime without walking back to the whiteboard, and the whiteboard can keep evolving independently.

-- Create summary table from a query (materialized aggregation)
CREATE TABLE gold.monthly_revenue
AS
SELECT
    YEAR(order_date) AS year,
    MONTH(order_date) AS month,
    department,
    SUM(amount) AS total_revenue,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM dbo.fact_orders f
JOIN dbo.dim_department d ON f.dept_key = d.dept_key
GROUP BY YEAR(order_date), MONTH(order_date), department;
-- Dashboard queries now hit this small table instead of scanning millions of fact rows

-- Create a snapshot for auditing (capture state at a point in time)
CREATE TABLE audit.customers_snapshot_20260605
AS SELECT * FROM gold.dim_customer;
-- If someone asks "what did the customer table look like last week?" — you have the answer

-- Create a filtered subset for a specific team
CREATE TABLE staging.high_value_customers
AS SELECT * FROM gold.dim_customer WHERE lifetime_value > 10000;

CTAS vs SELECT INTO: In Fabric Warehouse, use CTAS (ANSI standard syntax). SELECT INTO may not be supported. CTAS is the recommended approach for creating tables from queries.

CTAS vs Views: A view re-runs the query every time it is accessed — always fresh but always takes time. A CTAS table stores the results physically — instant reads but can become stale. Use views for always-current data. Use CTAS for pre-computed results where slight staleness is acceptable (refresh nightly or after each ETL run).

Dynamic Management Views (DMVs)

Dynamic Management Views (DMVs) are system views that expose real-time information about what is happening inside the Warehouse — active queries, running sessions, connection details, and performance metrics. They are your diagnostic tool for answering questions like “what query is running right now?”, “who is connected?”, and “why is this query slow?”

Real-life analogy: DMVs are like the security cameras and dashboard monitors in an airport control tower. The air traffic controllers (DBAs/engineers) can see every flight in the air (active queries), every plane at a gate (sessions), every incoming connection (connections), and how long each plane has been circling (query duration). Without DMVs, you are flying blind.

-- View currently running queries (what is executing RIGHT NOW?)
SELECT
    session_id,
    command,
    status,
    start_time,
    DATEDIFF(SECOND, start_time, GETDATE()) AS running_seconds,
    row_count
FROM sys.dm_exec_requests
WHERE status = 'running';

-- View query execution history (what ran recently and how long did it take?)
SELECT
    command,
    status,
    start_time,
    end_time,
    DATEDIFF(SECOND, start_time, end_time) AS duration_seconds,
    row_count
FROM sys.dm_exec_requests
ORDER BY start_time DESC;

-- View active sessions (who is connected?)
SELECT
    session_id,
    login_name,
    status,
    last_request_start_time
FROM sys.dm_exec_sessions
WHERE is_user_process = 1;

-- View connection info
SELECT * FROM sys.dm_exec_connections;

When to use DMVs vs Query Insights: DMVs show real-time, live data — use them for troubleshooting active issues (“why is the dashboard slow right now?”). Query Insight views (covered next) show historical data — use them for trend analysis (“which queries are consistently slow over the past week?”).

Query Insight Views

While DMVs show you what is happening right now, Query Insight views are Fabric’s built-in historical analytics for your Warehouse queries. They are pre-built views in the queryinsights schema that track execution history, identify frequently run queries, and flag long-running queries — all without any setup.

Real-life analogy: If DMVs are the live security cameras, Query Insight views are the recorded footage with highlights. Instead of watching hours of tape, the system automatically flags “this person visited 50 times this week” (frequently run queries) and “this person spent 3 hours in aisle 5” (long-running queries). You get the important patterns without manually searching.

-- Query history with performance metrics (what ran and how fast?)
SELECT
    query_text,
    start_time,
    end_time,
    duration_ms,
    row_count,
    status,
    submitted_by
FROM queryinsights.exec_requests_history
ORDER BY start_time DESC;

-- Frequently run queries (which queries run most often? candidates for caching)
SELECT
    query_hash,
    query_text,
    execution_count,
    avg_duration_ms,
    total_duration_ms
FROM queryinsights.frequently_run_queries
ORDER BY execution_count DESC;

-- Long-running queries (which queries are the slowest? candidates for optimization)
SELECT *
FROM queryinsights.long_running_queries
ORDER BY duration_ms DESC;

How to use Query Insights in practice:

Weekly review: Check long_running_queries every week — optimize anything over 30 seconds
Caching candidates: Check frequently_run_queries — if the same query runs 100+ times/day, enable result set caching
Failed queries: Filter exec_requests_history by status = 'Failed' — catch silent failures
User activity: Group by submitted_by — see which users run the heaviest queries

Visual Query Editor (No-Code)

Not everyone on your team writes SQL. The Visual Query editor lets business analysts build queries by dragging and dropping tables, drawing join lines, and clicking columns — no SQL required. Behind the scenes, it generates standard T-SQL that engineers can review.

Real-life analogy: The Visual Query editor is like building with LEGO instead of carving wood. Both produce the same result (a structure), but LEGO is accessible to anyone who can snap blocks together. The generated T-SQL is like the instruction manual that shows exactly which blocks were used and where — engineers can verify the build is structurally sound.

Open your Warehouse
Click New visual query
Drag tables from the Explorer onto the canvas
Draw lines between tables to create joins (drag key column to key column)
Click columns to add them to the output
Add filters, aggregations, sorting via the ribbon
Click Show SQL to see the generated T-SQL
Click Run to execute

The Visual Query editor generates standard T-SQL. Analysts build queries visually, and engineers can review the generated SQL for optimization. This is especially useful for ad-hoc analysis where writing SQL from scratch would slow down business users.

Connecting from SSMS and Azure Data Studio

The Fabric Warehouse query editor in the browser works well for most tasks, but many data engineers and DBAs prefer SQL Server Management Studio (SSMS) or Azure Data Studio — familiar tools with advanced features like IntelliSense, execution plan visualization, and multi-tab query windows. Fabric Warehouse supports direct connections from both tools.

SSMS Connection

Open SQL Server Management Studio (SSMS)
Server type: Database Engine
Server name: Copy from Warehouse settings → SQL connection string (format: xxx.datawarehouse.fabric.microsoft.com)
Authentication: Microsoft Entra ID – Universal with MFA (formerly Azure Active Directory)
Database: Select your warehouse name
Click Connect

Once connected, you can run T-SQL, manage security (GRANT, DENY, RLS, CLS), view execution plans, and use all SSMS features just like connecting to any SQL Server instance.

Granting Access via SSMS

-- Connected to Warehouse via SSMS:

-- Grant schema-level access (analysts can read gold, not staging)
GRANT SELECT ON SCHEMA::gold TO [analyst@company.com];
DENY SELECT ON SCHEMA::staging TO [analyst@company.com];

-- Grant column-level access (only specific columns)
GRANT SELECT ON gold.dim_customer (customer_id, name, city) TO [analyst@company.com];

-- Row-Level Security (regional managers see only their region)
CREATE FUNCTION dbo.fn_rls(@region VARCHAR(50))
RETURNS TABLE WITH SCHEMABINDING
AS RETURN SELECT 1 AS result WHERE @region = USER_NAME();

CREATE SECURITY POLICY RegionFilter
ADD FILTER PREDICATE dbo.fn_rls(region) ON gold.fact_sales
WITH (STATE = ON);

Integrating T-SQL with Spark Notebooks

One of Fabric’s most powerful features is the ability to combine T-SQL (Warehouse) and PySpark (Lakehouse notebooks) in the same workflow. You can read Warehouse tables from a Spark notebook using cross-database queries, transform data with PySpark, and write results back to a Lakehouse — bridging the SQL and Spark worlds.

Real-life analogy: This is like having a bilingual employee who can attend meetings in both English (T-SQL) and French (PySpark). The English meeting (Warehouse) produces a report. The bilingual employee (cross-database query) carries it to the French meeting (Spark notebook), where it gets analyzed with French-specific tools (PySpark transformations). The results go back to the English team as a polished deliverable.

# Method 1: SparkSQL cross-database query (read Warehouse from notebook)
df = spark.sql('''
    SELECT c.name, c.city, SUM(f.amount) as total_spent
    FROM gold_warehouse.gold.dim_customer c
    JOIN gold_warehouse.gold.fact_sales f ON c.customer_key = f.customer_key
    GROUP BY c.name, c.city
    ORDER BY total_spent DESC
''')
df.show()

# Method 2: Read Warehouse table directly into a Spark DataFrame
df = spark.read.table("gold_warehouse.gold.dim_customer")

# Method 3: Transform in Spark, write to Lakehouse
# (Direct write from notebook to Warehouse is limited —
#  use pipelines or stored procedures for Warehouse writes)
df_transformed = df.filter(col("city") == "Toronto").withColumn("segment", lit("Premium"))
df_transformed.write.format("delta").mode("overwrite").saveAsTable("silver_lakehouse.customers_toronto")

The recommended pattern: Read from Warehouse (T-SQL via SparkSQL) → transform in PySpark → write to Lakehouse. Or: read from Lakehouse (PySpark) → pipeline calls a Warehouse stored procedure for Gold layer loading. This gives you the best of both worlds — PySpark’s transformation power and T-SQL’s query and security capabilities.

Warehouse-Specific Optimization

The Fabric Warehouse is designed to be “no knobs” — it auto-manages most infrastructure. But there are still three critical optimization techniques that can make the difference between a 30-second query and a sub-second query: statistics, result set caching, and table design.

Statistics and Query Plans

When you run a query, the Fabric Warehouse query optimizer must decide HOW to execute it — which table to read first, whether to use a hash join or nested loop, which filters to apply first. This decision is called the query plan (also called an execution plan). A good plan runs in seconds. A bad plan runs in minutes.

The optimizer makes these decisions based on statistics — metadata about your data’s distribution. Statistics tell the optimizer things like: “the city column has 150 distinct values,” “85% of rows have date_key in the last 6 months,” or “customer_key ranges from 1 to 500,000.” Without statistics, the optimizer has to GUESS these numbers, and bad guesses lead to bad plans.

Real-life analogy: Statistics are like a restaurant’s inventory sheet. If the chef (optimizer) knows there are 200 chicken breasts and 50 lobster tails in stock (statistics), they plan prep efficiently — start the chicken first because it is the bulk of orders. Without the inventory sheet, the chef guesses and might prep lobster first, only to realize mid-service that 80% of orders are chicken. The kitchen (query) backs up because the plan was wrong.

-- Create statistics on frequently filtered/joined columns
CREATE STATISTICS stat_fact_date ON gold.fact_sales (date_key);
CREATE STATISTICS stat_fact_customer ON gold.fact_sales (customer_key);
CREATE STATISTICS stat_dim_city ON gold.dim_customer (city);

-- Update statistics after large data loads (stale stats = bad plans)
UPDATE STATISTICS gold.fact_sales;
UPDATE STATISTICS gold.dim_customer;

-- Multi-column statistics (for queries that filter on both columns together)
CREATE STATISTICS stat_fact_date_customer ON gold.fact_sales (date_key, customer_key);

Viewing query plans: In the Fabric Warehouse query editor, click the Explain button before running a query. The visual plan shows each operation as a node with cost percentages. Here is what to look for:

Warning signs in a query plan:
  ⚠️ Full table scan on a large table → missing statistics or filter
  ⚠️ Nested loop join on large tables → should be hash join (add statistics)
  ⚠️ High-cost sort operation → consider pre-sorting with CTAS
  ⚠️ Estimated rows vs actual rows wildly different → statistics are stale

Good signs in a query plan:
  ✅ Filter applied early (predicate pushdown)
  ✅ Hash join between large tables (efficient for big datasets)
  ✅ Low estimated cost percentage on each node
  ✅ Estimated rows close to actual rows (statistics are accurate)

The rule: Create statistics on every column used in WHERE clauses and JOIN conditions. Update statistics after every large data load (ETL pipeline completion). Stale statistics are the #1 cause of slow queries in any SQL warehouse.

Result Set Caching

Result set caching stores the output of a query in a cache. When the same query runs again, the Warehouse returns the cached result instantly instead of re-executing the query against the data. This is especially powerful for dashboards where multiple users run the same report queries throughout the day.

Real-life analogy: Result set caching is like a coffee shop that pre-brews the most popular coffee each morning. When the first customer orders a medium dark roast, the barista brews it fresh (30 seconds). When the next 50 customers order the same thing, the barista pours from the pre-brewed pot (instant). When someone orders a different drink (different query), the barista brews it fresh. And when the shop gets a new batch of beans (data changes), the pre-brewed pot is discarded and re-brewed.

-- Enable result set caching for the database
ALTER DATABASE CURRENT SET RESULT_SET_CACHING ON;

-- First run: query executes fully (e.g., 30 seconds)
SELECT region, SUM(amount) AS total_revenue
FROM gold.fact_sales
GROUP BY region;

-- Second run (same query): returns from cache (e.g., <1 second)
-- No re-execution — the Warehouse recognizes the identical query text

-- Check if your query used the cache
SELECT result_cache_hit FROM sys.dm_exec_requests WHERE session_id = @@SPID;
-- result_cache_hit = 1 → cache was used (instant)
-- result_cache_hit = 0 → query executed fresh

When the cache is invalidated: The cache is automatically cleared when the underlying data changes (INSERT, UPDATE, DELETE, MERGE on the source tables). This ensures you never see stale results — no manual cache management required.

Scenario	Cache Behavior
Same query, same user, data unchanged	Cache HIT — instant result
Same query, different user, data unchanged	Cache HIT — shared across users
Same query, data changed since last run	Cache MISS — re-executes and caches new result
Different query (even slightly different)	Cache MISS — different query text = different cache entry
Query with non-deterministic functions (GETDATE())	Never cached — result changes every time

Best practice: Enable result set caching on warehouses that serve Power BI dashboards and repeated analyst queries. The performance gain is dramatic — queries that took 30 seconds return in under 1 second on cache hits.

Table Distribution and Partitioning

In traditional data warehouses like Synapse Dedicated SQL Pool, you had to manually choose how data is distributed across compute nodes — hash distribution (split rows by a key column), round-robin (spread rows evenly), or replicated (copy the entire table to every node). Choosing wrong meant slow queries and expensive data reshuffling.

Fabric Warehouse handles distribution automatically. You do not choose hash vs round-robin vs replicated. Fabric stores tables as Delta Lake files in OneLake and manages the physical layout internally. This is one of the biggest simplifications compared to Synapse.

Real-life analogy: Synapse Dedicated SQL Pool was like organizing a warehouse by hand — you decided which shelf each product goes on, and a bad decision meant workers walking across the entire building for every order. Fabric Warehouse is like an automated fulfillment center — robots decide where to store each item based on access patterns, and you just focus on what to store and what to retrieve. The physical layout is optimized behind the scenes.

Since you cannot control distribution directly, optimize through table design and query patterns:

-- 1. Use narrow data types (less I/O = faster scans)
-- ❌ Wasteful
CREATE TABLE gold.fact_sales_bad (
    customer_key BIGINT,        -- BIGINT is 8 bytes (overkill unless billions of rows)
    city VARCHAR(MAX),          -- MAX means variable, unbounded — slow to scan
    order_date DATETIME2        -- 8 bytes, full precision rarely needed for dates
);

-- ✅ Optimized
CREATE TABLE gold.fact_sales_good (
    customer_key INT,           -- INT is 4 bytes (handles up to 2.1 billion values)
    city VARCHAR(50),           -- Fixed max length — optimizer can plan better
    order_date DATE             -- 3 bytes, date only (no time component needed)
);

-- 2. NOT NULL on key columns (enables better join optimization)
CREATE TABLE gold.fact_sales (
    sale_key INT IDENTITY(1,1) NOT NULL,
    date_key INT NOT NULL,
    customer_key INT NOT NULL,
    amount DECIMAL(10,2) NOT NULL
);

-- 3. Pre-aggregate with CTAS for repeated summary queries
CREATE TABLE gold.monthly_summary AS
SELECT YEAR(order_date) AS yr, MONTH(order_date) AS mo,
       region, SUM(amount) AS revenue, COUNT(*) AS order_count
FROM gold.fact_sales
GROUP BY YEAR(order_date), MONTH(order_date), region;
-- Dashboard queries hit this small summary table instead of scanning the full fact table

Optimization Technique	What It Does	Impact
Narrow data types	Reduces bytes per row → less I/O per scan	10-30% faster scans
NOT NULL constraints	Optimizer knows no NULLs → better join strategies	5-15% faster joins
Statistics on join/filter columns	Optimizer knows data distribution → correct plan	2-100x faster (prevents bad plans)
Result set caching	Returns cached results for repeated queries	30 seconds → <1 second
CTAS for summaries	Pre-computes aggregations into smaller table	Scan 1K rows instead of 10M
SELECT specific columns	Columnar storage reads only requested columns	50-90% less I/O vs SELECT *

Common Mistakes

Not using COPY INTO for bulk loads — INSERT INTO row-by-row is dramatically slower than COPY INTO for large datasets. A 1-million-row load that takes 45 minutes with INSERT can complete in under 2 minutes with COPY INTO.
Not checking Query Insights — slow queries hide in production for weeks before anyone notices. Check queryinsights.long_running_queries weekly and optimize anything over 30 seconds.
Forgetting to create and update statistics — stale or missing statistics cause the query optimizer to guess data distribution, leading to bad query plans and slow execution. Create statistics on every WHERE and JOIN column. Run UPDATE STATISTICS after every large ETL load.
Not using CTAS for materialized summaries — recalculating the same aggregation every time a dashboard loads wastes compute. Create a CTAS summary table, refresh it after each ETL run, and point dashboards to the summary instead of the full fact table.
Using SELECT * in production queries — Fabric Warehouse uses columnar storage. SELECT * reads EVERY column from storage even if you only need 3 of 50 columns. Always specify the exact columns you need — this can reduce I/O by 50-90%.
Applying functions on filter columns — WHERE YEAR(order_date) = 2026 cannot use statistics efficiently because the function transforms the column value. Use range predicates instead: WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01'.
Not enabling result set caching for dashboard workloads — if 50 users open the same Power BI report, the same underlying queries run 50 times. With result set caching, the first query runs fully and the next 49 return instantly from cache.
Using oversized data types — BIGINT when INT suffices, VARCHAR(MAX) when VARCHAR(50) is enough, DATETIME2 when DATE works. Oversized types increase storage, increase I/O per scan, and reduce the effectiveness of caching and statistics.

Interview Questions

Q: What is COPY INTO and when do you use it? A: COPY INTO is a T-SQL command that bulk-loads data from files (CSV, Parquet) stored in OneLake or external storage directly into Warehouse tables. It is significantly faster than row-by-row INSERT statements because it processes files in bulk. Use it inside stored procedures for automated bulk loading, or for one-time large data loads. For pipeline-orchestrated loads with error handling and monitoring, use Pipeline Copy Activity instead.

Q: What is CTAS and how does it differ from a view? A: CTAS (Create Table As Select) creates a new physical table from the results of a query. Unlike a view, which re-executes the query every time it is accessed, a CTAS table stores the results physically — reads are instant. The trade-off is that CTAS results can become stale (the source data may have changed). Use views for always-current data. Use CTAS for pre-computed summaries where slight staleness is acceptable, refreshed after each ETL run.

Q: What are statistics in Fabric Warehouse and why are they important? A: Statistics are metadata objects that describe the distribution of data in a column — how many distinct values, the minimum/maximum range, and the data histogram. The query optimizer uses statistics to create efficient execution plans. Without statistics, the optimizer guesses data distribution and often produces bad plans (wrong join type, wrong scan order) that execute 2-100x slower. Create statistics on every column used in WHERE and JOIN clauses, and update them after large data loads.

Q: What is a query plan and how do you read one? A: A query plan (execution plan) is the step-by-step strategy the query optimizer chooses to execute a query — which tables to scan first, which join algorithm to use, where to apply filters. In Fabric Warehouse, click the Explain button to view the plan. Look for warning signs: full table scans on large tables (missing statistics or filters), nested loop joins on large tables (should be hash joins), and large gaps between estimated and actual row counts (stale statistics). Good plans show early filtering, hash joins, and accurate row estimates.

Q: What is result set caching and when should you enable it? A: Result set caching stores query results in a cache. When the same query runs again and the underlying data has not changed, the cached result is returned instantly instead of re-executing the query. Enable it on warehouses serving Power BI dashboards and repeated analyst queries. The cache is automatically invalidated when source data changes (INSERT, UPDATE, DELETE), so you never see stale results. Queries with non-deterministic functions like GETDATE() are never cached.

Q: What are Query Insight views? A: Built-in views in the queryinsights schema that Fabric Warehouse provides automatically — no setup required. Three key views: exec_requests_history (query execution history with duration, row count, status), frequently_run_queries (queries by execution count — caching candidates), and long_running_queries (slowest queries — optimization candidates). Use them for weekly performance reviews and to identify optimization opportunities.

Q: How does Fabric Warehouse handle table distribution compared to Synapse Dedicated SQL Pool? A: Synapse Dedicated SQL Pool required manually choosing hash, round-robin, or replicated distribution for every table — a complex decision that significantly affected query performance. Fabric Warehouse handles distribution automatically behind the scenes using Delta Lake files in OneLake. You cannot and do not need to choose a distribution strategy. Instead, optimize through narrow data types, NOT NULL constraints, statistics, result set caching, and CTAS for pre-aggregated summaries.

Wrapping Up

The Fabric Warehouse is more than basic T-SQL. COPY INTO handles bulk loading at production speed. CTAS creates materialized summaries that dashboards can query instantly. DMVs and Query Insights give you real-time and historical visibility into query performance. The Visual Query editor opens SQL analysis to non-SQL users. SSMS connectivity brings familiar enterprise tooling. And the T-SQL + Spark notebook integration pattern bridges the SQL and PySpark worlds.

The optimization story is straightforward: create statistics on join and filter columns, update them after loads, enable result set caching for dashboards, use narrow data types, and never write SELECT *. These five practices alone can improve most Warehouse query performance by 10-100x.

← Previous: Warehouse Practical Guide Fabric (9/38) Next: Fabric Data Factory & Pipelines →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Fabric Warehouse Advanced: COPY INTO, CTAS, Dynamic Management Views, Query Insights, Visual Query Editor, SSMS Connectivity, and T-SQL with Notebooks

Table of Contents

COPY INTO: Bulk Data Loading

CTAS: Create Table As Select

Dynamic Management Views (DMVs)

Query Insight Views

Visual Query Editor (No-Code)

Connecting from SSMS and Azure Data Studio

SSMS Connection

Granting Access via SSMS

Integrating T-SQL with Spark Notebooks

Warehouse-Specific Optimization

Statistics and Query Plans

Result Set Caching

Table Distribution and Partitioning

Common Mistakes

Interview Questions

Wrapping Up

Leave a Comment Cancel Reply

Table of Contents

COPY INTO: Bulk Data Loading

CTAS: Create Table As Select

Dynamic Management Views (DMVs)

Query Insight Views

Visual Query Editor (No-Code)

Connecting from SSMS and Azure Data Studio

SSMS Connection

Granting Access via SSMS

Integrating T-SQL with Spark Notebooks

Warehouse-Specific Optimization

Statistics and Query Plans

Result Set Caching

Table Distribution and Partitioning

Common Mistakes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply