Databricks SQL and SQL Warehouses: Serverless Compute, Query Editor, Dashboards, Query History, and SQL-Native Analytics for Data Engineers

You have built Delta tables in your Lakehouse using PySpark notebooks. Bronze, Silver, Gold — all clean and optimized. Now the business analyst asks: “Can I just write SQL against these tables? I do not know PySpark.” The data scientist asks: “Can I query the Gold layer from a BI tool without spinning up a cluster?” The finance team asks: “Can I build a quick dashboard without Power BI?”

The answer to all three is Databricks SQL. It is a SQL-native interface on top of your Lakehouse — write SQL, get results, build dashboards — all without touching a notebook or writing a line of PySpark. Under the hood, it uses SQL Warehouses (dedicated or serverless compute optimized for SQL queries) instead of all-purpose clusters.

Think of it like this: your Lakehouse is a library (data stored in Delta tables). PySpark notebooks are like being a librarian — you can reorganize books, build catalogs, and process requests in bulk. Databricks SQL is like the library’s public search terminal — anyone can walk up, type a query, and find what they need without knowing how the library’s internal systems work.

What Is Databricks SQL?
SQL Warehouses: The Compute Behind Databricks SQL
Serverless vs Classic vs Pro SQL Warehouses
Creating a SQL Warehouse
SQL Warehouse Configuration and Sizing
Auto-Stop and Auto-Scaling
The SQL Editor
Writing Queries in the SQL Editor
Query Parameters
Saved Queries and Sharing
Dashboards in Databricks SQL
Creating a Dashboard
Dashboard Widgets and Filters
Scheduled Dashboard Refresh
Query History and Performance
Query Profile (Execution Plan)
Alerts (Automated Notifications)
Access Control for SQL Warehouses
Databricks SQL vs Notebooks
Databricks SQL vs Fabric Warehouse
Common Mistakes
Interview Questions
Wrapping Up

What Is Databricks SQL?

Databricks SQL (DBSQL) is a SQL-first analytics experience within the Databricks platform. It lets users query Delta Lake tables using standard SQL, without notebooks or PySpark knowledge.

Feature	Databricks SQL	Databricks Notebooks
Language	SQL only	Python, SQL, Scala, R
Compute	SQL Warehouses (optimized for SQL)	All-purpose or Job clusters
Users	Analysts, BI developers, data consumers	Data engineers, data scientists
Dashboards	Built-in dashboard builder	No native dashboards (use matplotlib/plotly)
Best for	Ad-hoc queries, reporting, dashboards	ETL, transformations, ML training
Cost model	Per-query (serverless) or per-hour (classic)	Per-hour (cluster uptime)

SQL Warehouses: The Compute Behind Databricks SQL

A SQL Warehouse is a compute resource optimized specifically for SQL queries. Unlike all-purpose clusters (which support Python, Scala, and ML workloads), SQL Warehouses are tuned for analytical queries — columnar processing, query caching, and photon acceleration.

All-Purpose Cluster:                SQL Warehouse:
  General-purpose compute             SQL-optimized compute
  Supports Python, SQL, Scala         SQL only
  Manual start/stop                   Auto-start on query, auto-stop on idle
  Used by: Data Engineers             Used by: Analysts, BI tools
  Billing: per-hour (always on)       Billing: per-hour or per-query (serverless)

Serverless vs Classic vs Pro SQL Warehouses

Type	Startup Time	Management	Cost	Best For
Serverless	Seconds	Fully managed by Databricks	Higher per-DBU, no idle cost	Ad-hoc queries, unpredictable workloads
Pro	2-5 minutes	Customer-managed (in your cloud)	Lower per-DBU, pay for idle	Predictable workloads, cost-sensitive
Classic	5-10 minutes	Customer-managed (legacy)	Lowest per-DBU	Budget-constrained, batch queries

Real-life analogy: Serverless is like an Uber — instantly available, no car to maintain, pay per ride. Pro is like leasing a car — takes a few minutes to get going, monthly payment, but cheaper per mile for regular commuters. Classic is like buying a used car — cheapest per mile but the most management overhead.

Creating a SQL Warehouse

Steps to create:
  1. Databricks workspace → SQL Warehouses (left sidebar)
  2. Click "Create SQL Warehouse"
  3. Configure:
     Name: analytics_warehouse
     Cluster size: Small (2X-Small for dev, Medium for production)
     Type: Serverless (recommended) or Pro
     Auto stop: 10 minutes (stops when idle to save cost)
     Scaling: Min 1, Max 3 clusters (auto-scales based on query load)
     Spot instances: Enable for non-critical workloads (cheaper, may be interrupted)
     Channel: Current (stable) or Preview (latest features)
  4. Click "Create"

SQL Warehouse Configuration and Sizing

Cluster Size	DBUs/hour	Use Case
2X-Small	2	Dev/test, small ad-hoc queries
X-Small	4	Light analytics, few concurrent users
Small	8	Standard analytics workloads
Medium	16	Production dashboards, moderate concurrency
Large	32	Heavy analytics, many concurrent users
X-Large+	64+	Large-scale BI, hundreds of concurrent queries

Auto-Stop and Auto-Scaling

# Auto-stop: warehouse shuts down after N minutes of no queries
# Default: 10 minutes. Set to 5 for dev, 15-30 for production.
# This is the single most important cost-saving setting.

# Auto-scaling: adds clusters when query queue grows
# Min clusters: 1 (always one ready)
# Max clusters: 3-5 (scales up during peak hours)
# Each cluster handles ~10 concurrent queries

# Example: 50 analysts run queries at 9 AM
# Min=1, Max=5: Databricks scales to 5 clusters
# By 11 AM, only 5 analysts active: scales back to 1 cluster
# By 1 PM, nobody querying: auto-stop shuts down (zero cost)

The SQL Editor

Writing Queries in the SQL Editor

-- Access: Databricks workspace → SQL Editor (left sidebar)
-- Select a SQL Warehouse from the dropdown (top right)
-- Select a catalog and schema from the left panel

-- Query Gold layer tables directly
SELECT 
    d.city,
    d.country,
    COUNT(DISTINCT f.customer_key) AS total_customers,
    SUM(f.order_amount) AS total_revenue,
    AVG(f.order_amount) AS avg_order_value
FROM gold.fact_orders f
JOIN gold.dim_customer d ON f.customer_key = d.customer_key
WHERE f.order_date >= '2026-01-01'
GROUP BY d.city, d.country
ORDER BY total_revenue DESC
LIMIT 20;

-- Results appear instantly (with query caching)
-- Click "Save" to save the query for reuse
-- Click "Share" to share with team members

Query Parameters

-- Use {{ parameter_name }} for dynamic inputs
-- When you run the query, a text box appears for each parameter

SELECT *
FROM gold.fact_orders
WHERE order_date >= '{{ start_date }}'
  AND order_date <= '{{ end_date }}'
  AND region = '{{ region }}'
ORDER BY order_amount DESC;

-- Parameter types: Text, Number, Date, Date Range, Dropdown List
-- Dropdown List: define allowed values (e.g., "East", "West", "North", "South")
-- Date Range: creates a calendar picker in the dashboard

Saved queries live in your workspace and can be shared with team members. Permissions control who can view, edit, or run each query. Organize queries in folders by domain (Sales, Finance, Operations) for team-wide discoverability.

Dashboards in Databricks SQL

Creating a Dashboard

Steps:
  1. SQL Editor → run a query → click "Add to Dashboard"
  2. Or: Dashboards (left sidebar) → "Create Dashboard"
  3. Name: "Daily Sales Overview"
  4. Add widgets:
     - Counter: Total Revenue (single number)
     - Bar Chart: Revenue by Region
     - Line Chart: Daily Revenue Trend
     - Table: Top 10 Products by Sales
     - Pie Chart: Revenue by Category
  5. Each widget is powered by a saved SQL query
  6. Arrange widgets by dragging on the canvas
  7. Click "Share" to publish to team

Dashboard Widgets and Filters

Widget Type	Use Case
Counter	Single KPI (Total Revenue: $2.4M)
Bar Chart	Comparisons (Revenue by Region)
Line Chart	Trends over time (Daily Active Users)
Table	Detailed data (Top 20 Customers)
Pie/Donut Chart	Proportions (Market Share by Product)
Scatter Plot	Correlations (Spend vs Frequency)
Map	Geographic data (Sales by State)

Dashboard-level filters apply across all widgets. Add a date range filter once and every widget filters simultaneously — no need to parameterize each query separately.

Scheduled Dashboard Refresh

Schedule: Dashboard → three dots menu → "Schedule"
  Frequency: Every 1 hour / Every day at 8 AM / Custom cron
  SQL Warehouse: select the warehouse to use for refresh
  Subscribers: add email recipients for the refreshed dashboard

The dashboard refreshes automatically — stakeholders receive updated
dashboards in their inbox without logging into Databricks.

Query History and Performance

Every query executed on a SQL Warehouse is logged in Query History:

Access: SQL Warehouses → select warehouse → Query History tab

Each entry shows:
  - Query text (full SQL)
  - User who ran it
  - Start time and duration
  - Rows returned
  - Bytes scanned
  - Status (Succeeded, Failed, Cancelled)

Use this to:
  1. Find slow queries (sort by duration)
  2. Identify expensive queries (sort by bytes scanned)
  3. Audit who queried what (compliance)
  4. Optimize: rewrite slow queries, add indexes, partition tables

Query Profile (Execution Plan)

Click any query in history → Query Profile to see the execution plan. This shows exactly how Databricks executed your query — which tables were scanned, how joins were performed, where time was spent. Look for full table scans (missing filters), sort-merge joins on small tables (should be broadcast), and skewed partitions.

Alerts (Automated Notifications)

Alerts trigger notifications when a query result meets a condition:

Setup:
  1. Write a query that returns a single value:
     SELECT COUNT(*) as failed_rows FROM silver.data_quality_log
     WHERE status = 'FAILED' AND check_date = CURRENT_DATE
  2. Create Alert: SQL Editor → Alerts → "Create Alert"
  3. Condition: "Value is greater than 0"
  4. Frequency: Every 1 hour
  5. Notification: Email or Slack webhook

Use cases:
  - Data quality: alert when failed row count > 0
  - Pipeline monitoring: alert when no new data in 24 hours
  - Business KPI: alert when daily revenue drops below threshold
  - Capacity: alert when table size exceeds storage limit

Access Control for SQL Warehouses

Permission	What It Allows	Typical User
CAN USE	Run queries on the warehouse	Analysts, BI developers
CAN MONITOR	View query history and metrics	Team leads, admins
CAN MANAGE	Start/stop, resize, configure	Platform team, workspace admins
IS OWNER	Full control including delete	Workspace admin

Data access is controlled separately via Unity Catalog — even if a user can USE the warehouse, they can only query tables they have SELECT permission on.

Databricks SQL vs Notebooks

Use Case	Databricks SQL	Notebooks
Ad-hoc data exploration	✅ Best (SQL editor, instant results)	OK (heavier setup)
Building dashboards	✅ Best (native dashboard builder)	❌ No native dashboards
ETL / data transformations	❌ Limited (SQL only)	✅ Best (PySpark, complex logic)
Machine learning	❌ Not supported	✅ Best (MLflow, sklearn, etc.)
Sharing results with business	✅ Best (shareable dashboards)	Moderate (export/screenshot)
Scheduled data loads	Moderate (SQL-only transforms)	✅ Best (full Python + PySpark)

Databricks SQL vs Fabric Warehouse

Feature	Databricks SQL	Fabric Warehouse
SQL dialect	Spark SQL (ANSI-compatible)	T-SQL (SQL Server compatible)
Storage	Delta Lake on cloud storage	OneLake (Delta format)
Compute	SQL Warehouses (Photon engine)	Serverless SQL (always-on)
Dashboards	Built-in dashboard builder	Power BI (separate service)
Governance	Unity Catalog	Purview + workspace roles
Best for	Multi-cloud, Spark-centric orgs	Microsoft-centric orgs, Power BI users

Common Mistakes

Not setting auto-stop — a SQL Warehouse left running 24/7 costs thousands per month. Set auto-stop to 10-15 minutes.
Using all-purpose clusters for SQL analytics — SQL Warehouses are optimized for SQL queries (Photon engine, query caching). All-purpose clusters are for ETL and ML, not ad-hoc queries.
Running full table scans — always filter by partitioned columns (date, region) to reduce bytes scanned and query cost.
Not using query caching — SQL Warehouses cache query results. Running the same query twice is instant the second time. Do not disable caching unless you need real-time results.
Over-sizing the warehouse for dev — a 2X-Small warehouse is sufficient for development and testing. Use Medium/Large only for production dashboards with concurrent users.

Interview Questions

Q: What is the difference between a SQL Warehouse and an all-purpose cluster? A: SQL Warehouses are optimized for SQL-only analytics workloads with features like Photon acceleration, query caching, and auto-scaling. All-purpose clusters support Python, Scala, R, and SQL, and are used for ETL, data engineering, and ML. SQL Warehouses auto-start on query and auto-stop when idle. All-purpose clusters require manual management.

Q: What is the difference between Serverless and Pro SQL Warehouses? A: Serverless warehouses start in seconds, are fully managed by Databricks, and have no idle cost — you pay only when queries run. Pro warehouses take 2-5 minutes to start, run in your cloud account, and cost less per DBU but you pay for idle time. Use Serverless for unpredictable workloads and Pro for predictable, cost-sensitive production use.

Q: How do alerts work in Databricks SQL? A: You write a query that returns a single value (e.g., count of failed rows). You create an alert with a condition (e.g., value > 0), a check frequency (e.g., hourly), and a notification channel (email or Slack). When the condition is met, Databricks sends the alert automatically. Use for data quality monitoring, SLA checks, and business KPI thresholds.

Wrapping Up

Databricks SQL democratizes data access. Data engineers build the Lakehouse with PySpark notebooks. Analysts and business users consume it through Databricks SQL — writing queries, building dashboards, and setting alerts — all without needing Python skills. The SQL Warehouse provides the compute, Unity Catalog provides the governance, and the dashboard builder provides the visualization.

← Previous: Unity Catalog Deep Dive

Databricks (17/19)

Next: Streaming with Databricks →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Databricks SQL and SQL Warehouses: Serverless Compute, Query Editor, Dashboards, Query History, and SQL-Native Analytics for Data Engineers

Databricks SQL and SQL Warehouses: Serverless Compute, Query Editor, Dashboards, Query History, and SQL-Native Analytics for Data Engineers

Table of Contents

What Is Databricks SQL?

SQL Warehouses: The Compute Behind Databricks SQL

Serverless vs Classic vs Pro SQL Warehouses

Creating a SQL Warehouse

SQL Warehouse Configuration and Sizing

Auto-Stop and Auto-Scaling

The SQL Editor

Writing Queries in the SQL Editor

Query Parameters

Dashboards in Databricks SQL

Creating a Dashboard

Dashboard Widgets and Filters

Scheduled Dashboard Refresh

Query History and Performance

Query Profile (Execution Plan)

Alerts (Automated Notifications)

Access Control for SQL Warehouses

Databricks SQL vs Notebooks

Databricks SQL vs Fabric Warehouse

Common Mistakes

Interview Questions

Wrapping Up

Leave a Comment Cancel Reply

Databricks SQL and SQL Warehouses: Serverless Compute, Query Editor, Dashboards, Query History, and SQL-Native Analytics for Data Engineers

Table of Contents

What Is Databricks SQL?

SQL Warehouses: The Compute Behind Databricks SQL

Serverless vs Classic vs Pro SQL Warehouses

Creating a SQL Warehouse

SQL Warehouse Configuration and Sizing

Auto-Stop and Auto-Scaling

The SQL Editor

Writing Queries in the SQL Editor

Query Parameters

Saved Queries and Sharing

Dashboards in Databricks SQL

Creating a Dashboard

Dashboard Widgets and Filters

Scheduled Dashboard Refresh

Query History and Performance

Query Profile (Execution Plan)

Alerts (Automated Notifications)

Access Control for SQL Warehouses

Databricks SQL vs Notebooks

Databricks SQL vs Fabric Warehouse

Common Mistakes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply