Databricks SQL and SQL Warehouses: Serverless Compute, Query Editor, Dashboards, Query History, and SQL-Native Analytics for Data Engineers
You have built Delta tables in your Lakehouse using PySpark notebooks. Bronze, Silver, Gold — all clean and optimized. Now the business analyst asks: “Can I just write SQL against these tables? I do not know PySpark.” The data scientist asks: “Can I query the Gold layer from a BI tool without spinning up a cluster?” The finance team asks: “Can I build a quick dashboard without Power BI?”
The answer to all three is Databricks SQL. It is a SQL-native interface on top of your Lakehouse — write SQL, get results, build dashboards — all without touching a notebook or writing a line of PySpark. Under the hood, it uses SQL Warehouses (dedicated or serverless compute optimized for SQL queries) instead of all-purpose clusters.
Think of it like this: your Lakehouse is a library (data stored in Delta tables). PySpark notebooks are like being a librarian — you can reorganize books, build catalogs, and process requests in bulk. Databricks SQL is like the library’s public search terminal — anyone can walk up, type a query, and find what they need without knowing how the library’s internal systems work.
Table of Contents
- What Is Databricks SQL?
- SQL Warehouses: The Compute Behind Databricks SQL
- Serverless vs Classic vs Pro SQL Warehouses
- Creating a SQL Warehouse
- SQL Warehouse Configuration and Sizing
- Auto-Stop and Auto-Scaling
- The SQL Editor
- Writing Queries in the SQL Editor
- Query Parameters
- Saved Queries and Sharing
- Dashboards in Databricks SQL
- Creating a Dashboard
- Dashboard Widgets and Filters
- Scheduled Dashboard Refresh
- Query History and Performance
- Query Profile (Execution Plan)
- Alerts (Automated Notifications)
- Access Control for SQL Warehouses
- Databricks SQL vs Notebooks
- Databricks SQL vs Fabric Warehouse
- Common Mistakes
- Interview Questions
- Wrapping Up
What Is Databricks SQL?
Databricks SQL (DBSQL) is a SQL-first analytics experience within the Databricks platform. It lets users query Delta Lake tables using standard SQL, without notebooks or PySpark knowledge.
| Feature | Databricks SQL | Databricks Notebooks |
|---|---|---|
| Language | SQL only | Python, SQL, Scala, R |
| Compute | SQL Warehouses (optimized for SQL) | All-purpose or Job clusters |
| Users | Analysts, BI developers, data consumers | Data engineers, data scientists |
| Dashboards | Built-in dashboard builder | No native dashboards (use matplotlib/plotly) |
| Best for | Ad-hoc queries, reporting, dashboards | ETL, transformations, ML training |
| Cost model | Per-query (serverless) or per-hour (classic) | Per-hour (cluster uptime) |
SQL Warehouses: The Compute Behind Databricks SQL
A SQL Warehouse is a compute resource optimized specifically for SQL queries. Unlike all-purpose clusters (which support Python, Scala, and ML workloads), SQL Warehouses are tuned for analytical queries — columnar processing, query caching, and photon acceleration.
All-Purpose Cluster: SQL Warehouse:
General-purpose compute SQL-optimized compute
Supports Python, SQL, Scala SQL only
Manual start/stop Auto-start on query, auto-stop on idle
Used by: Data Engineers Used by: Analysts, BI tools
Billing: per-hour (always on) Billing: per-hour or per-query (serverless)
Serverless vs Classic vs Pro SQL Warehouses
| Type | Startup Time | Management | Cost | Best For |
|---|---|---|---|---|
| Serverless | Seconds | Fully managed by Databricks | Higher per-DBU, no idle cost | Ad-hoc queries, unpredictable workloads |
| Pro | 2-5 minutes | Customer-managed (in your cloud) | Lower per-DBU, pay for idle | Predictable workloads, cost-sensitive |
| Classic | 5-10 minutes | Customer-managed (legacy) | Lowest per-DBU | Budget-constrained, batch queries |
Real-life analogy: Serverless is like an Uber — instantly available, no car to maintain, pay per ride. Pro is like leasing a car — takes a few minutes to get going, monthly payment, but cheaper per mile for regular commuters. Classic is like buying a used car — cheapest per mile but the most management overhead.
Creating a SQL Warehouse
Steps to create:
1. Databricks workspace → SQL Warehouses (left sidebar)
2. Click "Create SQL Warehouse"
3. Configure:
Name: analytics_warehouse
Cluster size: Small (2X-Small for dev, Medium for production)
Type: Serverless (recommended) or Pro
Auto stop: 10 minutes (stops when idle to save cost)
Scaling: Min 1, Max 3 clusters (auto-scales based on query load)
Spot instances: Enable for non-critical workloads (cheaper, may be interrupted)
Channel: Current (stable) or Preview (latest features)
4. Click "Create"
SQL Warehouse Configuration and Sizing
| Cluster Size | DBUs/hour | Use Case |
|---|---|---|
| 2X-Small | 2 | Dev/test, small ad-hoc queries |
| X-Small | 4 | Light analytics, few concurrent users |
| Small | 8 | Standard analytics workloads |
| Medium | 16 | Production dashboards, moderate concurrency |
| Large | 32 | Heavy analytics, many concurrent users |
| X-Large+ | 64+ | Large-scale BI, hundreds of concurrent queries |
Auto-Stop and Auto-Scaling
# Auto-stop: warehouse shuts down after N minutes of no queries
# Default: 10 minutes. Set to 5 for dev, 15-30 for production.
# This is the single most important cost-saving setting.
# Auto-scaling: adds clusters when query queue grows
# Min clusters: 1 (always one ready)
# Max clusters: 3-5 (scales up during peak hours)
# Each cluster handles ~10 concurrent queries
# Example: 50 analysts run queries at 9 AM
# Min=1, Max=5: Databricks scales to 5 clusters
# By 11 AM, only 5 analysts active: scales back to 1 cluster
# By 1 PM, nobody querying: auto-stop shuts down (zero cost)
The SQL Editor
Writing Queries in the SQL Editor
-- Access: Databricks workspace → SQL Editor (left sidebar)
-- Select a SQL Warehouse from the dropdown (top right)
-- Select a catalog and schema from the left panel
-- Query Gold layer tables directly
SELECT
d.city,
d.country,
COUNT(DISTINCT f.customer_key) AS total_customers,
SUM(f.order_amount) AS total_revenue,
AVG(f.order_amount) AS avg_order_value
FROM gold.fact_orders f
JOIN gold.dim_customer d ON f.customer_key = d.customer_key
WHERE f.order_date >= '2026-01-01'
GROUP BY d.city, d.country
ORDER BY total_revenue DESC
LIMIT 20;
-- Results appear instantly (with query caching)
-- Click "Save" to save the query for reuse
-- Click "Share" to share with team members
Query Parameters
-- Use {{ parameter_name }} for dynamic inputs
-- When you run the query, a text box appears for each parameter
SELECT *
FROM gold.fact_orders
WHERE order_date >= '{{ start_date }}'
AND order_date <= '{{ end_date }}'
AND region = '{{ region }}'
ORDER BY order_amount DESC;
-- Parameter types: Text, Number, Date, Date Range, Dropdown List
-- Dropdown List: define allowed values (e.g., "East", "West", "North", "South")
-- Date Range: creates a calendar picker in the dashboard
Saved Queries and Sharing
Saved queries live in your workspace and can be shared with team members. Permissions control who can view, edit, or run each query. Organize queries in folders by domain (Sales, Finance, Operations) for team-wide discoverability.
Dashboards in Databricks SQL
Creating a Dashboard
Steps:
1. SQL Editor → run a query → click "Add to Dashboard"
2. Or: Dashboards (left sidebar) → "Create Dashboard"
3. Name: "Daily Sales Overview"
4. Add widgets:
- Counter: Total Revenue (single number)
- Bar Chart: Revenue by Region
- Line Chart: Daily Revenue Trend
- Table: Top 10 Products by Sales
- Pie Chart: Revenue by Category
5. Each widget is powered by a saved SQL query
6. Arrange widgets by dragging on the canvas
7. Click "Share" to publish to team
Dashboard Widgets and Filters
| Widget Type | Use Case |
|---|---|
| Counter | Single KPI (Total Revenue: $2.4M) |
| Bar Chart | Comparisons (Revenue by Region) |
| Line Chart | Trends over time (Daily Active Users) |
| Table | Detailed data (Top 20 Customers) |
| Pie/Donut Chart | Proportions (Market Share by Product) |
| Scatter Plot | Correlations (Spend vs Frequency) |
| Map | Geographic data (Sales by State) |
Dashboard-level filters apply across all widgets. Add a date range filter once and every widget filters simultaneously — no need to parameterize each query separately.
Scheduled Dashboard Refresh
Schedule: Dashboard → three dots menu → "Schedule"
Frequency: Every 1 hour / Every day at 8 AM / Custom cron
SQL Warehouse: select the warehouse to use for refresh
Subscribers: add email recipients for the refreshed dashboard
The dashboard refreshes automatically — stakeholders receive updated
dashboards in their inbox without logging into Databricks.
Query History and Performance
Every query executed on a SQL Warehouse is logged in Query History:
Access: SQL Warehouses → select warehouse → Query History tab
Each entry shows:
- Query text (full SQL)
- User who ran it
- Start time and duration
- Rows returned
- Bytes scanned
- Status (Succeeded, Failed, Cancelled)
Use this to:
1. Find slow queries (sort by duration)
2. Identify expensive queries (sort by bytes scanned)
3. Audit who queried what (compliance)
4. Optimize: rewrite slow queries, add indexes, partition tables
Query Profile (Execution Plan)
Click any query in history → Query Profile to see the execution plan. This shows exactly how Databricks executed your query — which tables were scanned, how joins were performed, where time was spent. Look for full table scans (missing filters), sort-merge joins on small tables (should be broadcast), and skewed partitions.
Alerts (Automated Notifications)
Alerts trigger notifications when a query result meets a condition:
Setup:
1. Write a query that returns a single value:
SELECT COUNT(*) as failed_rows FROM silver.data_quality_log
WHERE status = 'FAILED' AND check_date = CURRENT_DATE
2. Create Alert: SQL Editor → Alerts → "Create Alert"
3. Condition: "Value is greater than 0"
4. Frequency: Every 1 hour
5. Notification: Email or Slack webhook
Use cases:
- Data quality: alert when failed row count > 0
- Pipeline monitoring: alert when no new data in 24 hours
- Business KPI: alert when daily revenue drops below threshold
- Capacity: alert when table size exceeds storage limit
Access Control for SQL Warehouses
| Permission | What It Allows | Typical User |
|---|---|---|
| CAN USE | Run queries on the warehouse | Analysts, BI developers |
| CAN MONITOR | View query history and metrics | Team leads, admins |
| CAN MANAGE | Start/stop, resize, configure | Platform team, workspace admins |
| IS OWNER | Full control including delete | Workspace admin |
Data access is controlled separately via Unity Catalog — even if a user can USE the warehouse, they can only query tables they have SELECT permission on.
Databricks SQL vs Notebooks
| Use Case | Databricks SQL | Notebooks |
|---|---|---|
| Ad-hoc data exploration | ✅ Best (SQL editor, instant results) | OK (heavier setup) |
| Building dashboards | ✅ Best (native dashboard builder) | ❌ No native dashboards |
| ETL / data transformations | ❌ Limited (SQL only) | ✅ Best (PySpark, complex logic) |
| Machine learning | ❌ Not supported | ✅ Best (MLflow, sklearn, etc.) |
| Sharing results with business | ✅ Best (shareable dashboards) | Moderate (export/screenshot) |
| Scheduled data loads | Moderate (SQL-only transforms) | ✅ Best (full Python + PySpark) |
Databricks SQL vs Fabric Warehouse
| Feature | Databricks SQL | Fabric Warehouse |
|---|---|---|
| SQL dialect | Spark SQL (ANSI-compatible) | T-SQL (SQL Server compatible) |
| Storage | Delta Lake on cloud storage | OneLake (Delta format) |
| Compute | SQL Warehouses (Photon engine) | Serverless SQL (always-on) |
| Dashboards | Built-in dashboard builder | Power BI (separate service) |
| Governance | Unity Catalog | Purview + workspace roles |
| Best for | Multi-cloud, Spark-centric orgs | Microsoft-centric orgs, Power BI users |
Common Mistakes
- Not setting auto-stop — a SQL Warehouse left running 24/7 costs thousands per month. Set auto-stop to 10-15 minutes.
- Using all-purpose clusters for SQL analytics — SQL Warehouses are optimized for SQL queries (Photon engine, query caching). All-purpose clusters are for ETL and ML, not ad-hoc queries.
- Running full table scans — always filter by partitioned columns (date, region) to reduce bytes scanned and query cost.
- Not using query caching — SQL Warehouses cache query results. Running the same query twice is instant the second time. Do not disable caching unless you need real-time results.
- Over-sizing the warehouse for dev — a 2X-Small warehouse is sufficient for development and testing. Use Medium/Large only for production dashboards with concurrent users.
Interview Questions
Q: What is the difference between a SQL Warehouse and an all-purpose cluster? A: SQL Warehouses are optimized for SQL-only analytics workloads with features like Photon acceleration, query caching, and auto-scaling. All-purpose clusters support Python, Scala, R, and SQL, and are used for ETL, data engineering, and ML. SQL Warehouses auto-start on query and auto-stop when idle. All-purpose clusters require manual management.
Q: What is the difference between Serverless and Pro SQL Warehouses? A: Serverless warehouses start in seconds, are fully managed by Databricks, and have no idle cost — you pay only when queries run. Pro warehouses take 2-5 minutes to start, run in your cloud account, and cost less per DBU but you pay for idle time. Use Serverless for unpredictable workloads and Pro for predictable, cost-sensitive production use.
Q: How do alerts work in Databricks SQL? A: You write a query that returns a single value (e.g., count of failed rows). You create an alert with a condition (e.g., value > 0), a check frequency (e.g., hourly), and a notification channel (email or Slack). When the condition is met, Databricks sends the alert automatically. Use for data quality monitoring, SLA checks, and business KPI thresholds.
Wrapping Up
Databricks SQL democratizes data access. Data engineers build the Lakehouse with PySpark notebooks. Analysts and business users consume it through Databricks SQL — writing queries, building dashboards, and setting alerts — all without needing Python skills. The SQL Warehouse provides the compute, Unity Catalog provides the governance, and the dashboard builder provides the visualization.
Related posts: – Databricks Introduction – Unity Catalog Deep Dive – Delta Lake Deep Dive – Fabric Lakehouse vs Warehouse
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.