Databricks SQL and SQL Warehouses: Serverless Compute, Query Editor, Dashboards, Query History, and SQL-Native Analytics for Data Engineers

Databricks SQL and SQL Warehouses: Serverless Compute, Query Editor, Dashboards, Query History, and SQL-Native Analytics for Data Engineers

You have built Delta tables in your Lakehouse using PySpark notebooks. Bronze, Silver, Gold — all clean and optimized. Now the business analyst asks: “Can I just write SQL against these tables? I do not know PySpark.” The data scientist asks: “Can I query the Gold layer from a BI tool without spinning up a cluster?” The finance team asks: “Can I build a quick dashboard without Power BI?”

The answer to all three is Databricks SQL. It is a SQL-native interface on top of your Lakehouse — write SQL, get results, build dashboards — all without touching a notebook or writing a line of PySpark. Under the hood, it uses SQL Warehouses (dedicated or serverless compute optimized for SQL queries) instead of all-purpose clusters.

Think of it like this: your Lakehouse is a library (data stored in Delta tables). PySpark notebooks are like being a librarian — you can reorganize books, build catalogs, and process requests in bulk. Databricks SQL is like the library’s public search terminal — anyone can walk up, type a query, and find what they need without knowing how the library’s internal systems work.

Table of Contents

  • What Is Databricks SQL?
  • SQL Warehouses: The Compute Behind Databricks SQL
  • Serverless vs Classic vs Pro SQL Warehouses
  • Creating a SQL Warehouse
  • SQL Warehouse Configuration and Sizing
  • Auto-Stop and Auto-Scaling
  • The SQL Editor
  • Writing Queries in the SQL Editor
  • Query Parameters
  • Saved Queries and Sharing
  • Dashboards in Databricks SQL
  • Creating a Dashboard
  • Dashboard Widgets and Filters
  • Scheduled Dashboard Refresh
  • Query History and Performance
  • Query Profile (Execution Plan)
  • Alerts (Automated Notifications)
  • Access Control for SQL Warehouses
  • Databricks SQL vs Notebooks
  • Databricks SQL vs Fabric Warehouse
  • Common Mistakes
  • Interview Questions
  • Wrapping Up

What Is Databricks SQL?

Databricks SQL (DBSQL) is a SQL-first analytics experience within the Databricks platform. It lets users query Delta Lake tables using standard SQL, without notebooks or PySpark knowledge.

Feature Databricks SQL Databricks Notebooks
Language SQL only Python, SQL, Scala, R
Compute SQL Warehouses (optimized for SQL) All-purpose or Job clusters
Users Analysts, BI developers, data consumers Data engineers, data scientists
Dashboards Built-in dashboard builder No native dashboards (use matplotlib/plotly)
Best for Ad-hoc queries, reporting, dashboards ETL, transformations, ML training
Cost model Per-query (serverless) or per-hour (classic) Per-hour (cluster uptime)

SQL Warehouses: The Compute Behind Databricks SQL

A SQL Warehouse is a compute resource optimized specifically for SQL queries. Unlike all-purpose clusters (which support Python, Scala, and ML workloads), SQL Warehouses are tuned for analytical queries — columnar processing, query caching, and photon acceleration.

All-Purpose Cluster:                SQL Warehouse:
  General-purpose compute             SQL-optimized compute
  Supports Python, SQL, Scala         SQL only
  Manual start/stop                   Auto-start on query, auto-stop on idle
  Used by: Data Engineers             Used by: Analysts, BI tools
  Billing: per-hour (always on)       Billing: per-hour or per-query (serverless)

Serverless vs Classic vs Pro SQL Warehouses

Type Startup Time Management Cost Best For
Serverless Seconds Fully managed by Databricks Higher per-DBU, no idle cost Ad-hoc queries, unpredictable workloads
Pro 2-5 minutes Customer-managed (in your cloud) Lower per-DBU, pay for idle Predictable workloads, cost-sensitive
Classic 5-10 minutes Customer-managed (legacy) Lowest per-DBU Budget-constrained, batch queries

Real-life analogy: Serverless is like an Uber — instantly available, no car to maintain, pay per ride. Pro is like leasing a car — takes a few minutes to get going, monthly payment, but cheaper per mile for regular commuters. Classic is like buying a used car — cheapest per mile but the most management overhead.

Creating a SQL Warehouse

Steps to create:
  1. Databricks workspace → SQL Warehouses (left sidebar)
  2. Click "Create SQL Warehouse"
  3. Configure:
     Name: analytics_warehouse
     Cluster size: Small (2X-Small for dev, Medium for production)
     Type: Serverless (recommended) or Pro
     Auto stop: 10 minutes (stops when idle to save cost)
     Scaling: Min 1, Max 3 clusters (auto-scales based on query load)
     Spot instances: Enable for non-critical workloads (cheaper, may be interrupted)
     Channel: Current (stable) or Preview (latest features)
  4. Click "Create"

SQL Warehouse Configuration and Sizing

Cluster Size DBUs/hour Use Case
2X-Small 2 Dev/test, small ad-hoc queries
X-Small 4 Light analytics, few concurrent users
Small 8 Standard analytics workloads
Medium 16 Production dashboards, moderate concurrency
Large 32 Heavy analytics, many concurrent users
X-Large+ 64+ Large-scale BI, hundreds of concurrent queries

Auto-Stop and Auto-Scaling

# Auto-stop: warehouse shuts down after N minutes of no queries
# Default: 10 minutes. Set to 5 for dev, 15-30 for production.
# This is the single most important cost-saving setting.

# Auto-scaling: adds clusters when query queue grows
# Min clusters: 1 (always one ready)
# Max clusters: 3-5 (scales up during peak hours)
# Each cluster handles ~10 concurrent queries

# Example: 50 analysts run queries at 9 AM
# Min=1, Max=5: Databricks scales to 5 clusters
# By 11 AM, only 5 analysts active: scales back to 1 cluster
# By 1 PM, nobody querying: auto-stop shuts down (zero cost)

The SQL Editor

Writing Queries in the SQL Editor

-- Access: Databricks workspace → SQL Editor (left sidebar)
-- Select a SQL Warehouse from the dropdown (top right)
-- Select a catalog and schema from the left panel

-- Query Gold layer tables directly
SELECT 
    d.city,
    d.country,
    COUNT(DISTINCT f.customer_key) AS total_customers,
    SUM(f.order_amount) AS total_revenue,
    AVG(f.order_amount) AS avg_order_value
FROM gold.fact_orders f
JOIN gold.dim_customer d ON f.customer_key = d.customer_key
WHERE f.order_date >= '2026-01-01'
GROUP BY d.city, d.country
ORDER BY total_revenue DESC
LIMIT 20;

-- Results appear instantly (with query caching)
-- Click "Save" to save the query for reuse
-- Click "Share" to share with team members

Query Parameters

-- Use {{ parameter_name }} for dynamic inputs
-- When you run the query, a text box appears for each parameter

SELECT *
FROM gold.fact_orders
WHERE order_date >= '{{ start_date }}'
  AND order_date <= '{{ end_date }}'
  AND region = '{{ region }}'
ORDER BY order_amount DESC;

-- Parameter types: Text, Number, Date, Date Range, Dropdown List
-- Dropdown List: define allowed values (e.g., "East", "West", "North", "South")
-- Date Range: creates a calendar picker in the dashboard

Saved Queries and Sharing

Saved queries live in your workspace and can be shared with team members. Permissions control who can view, edit, or run each query. Organize queries in folders by domain (Sales, Finance, Operations) for team-wide discoverability.

Dashboards in Databricks SQL

Creating a Dashboard

Steps:
  1. SQL Editor → run a query → click "Add to Dashboard"
  2. Or: Dashboards (left sidebar) → "Create Dashboard"
  3. Name: "Daily Sales Overview"
  4. Add widgets:
     - Counter: Total Revenue (single number)
     - Bar Chart: Revenue by Region
     - Line Chart: Daily Revenue Trend
     - Table: Top 10 Products by Sales
     - Pie Chart: Revenue by Category
  5. Each widget is powered by a saved SQL query
  6. Arrange widgets by dragging on the canvas
  7. Click "Share" to publish to team

Dashboard Widgets and Filters

Widget Type Use Case
Counter Single KPI (Total Revenue: $2.4M)
Bar Chart Comparisons (Revenue by Region)
Line Chart Trends over time (Daily Active Users)
Table Detailed data (Top 20 Customers)
Pie/Donut Chart Proportions (Market Share by Product)
Scatter Plot Correlations (Spend vs Frequency)
Map Geographic data (Sales by State)

Dashboard-level filters apply across all widgets. Add a date range filter once and every widget filters simultaneously — no need to parameterize each query separately.

Scheduled Dashboard Refresh

Schedule: Dashboard → three dots menu → "Schedule"
  Frequency: Every 1 hour / Every day at 8 AM / Custom cron
  SQL Warehouse: select the warehouse to use for refresh
  Subscribers: add email recipients for the refreshed dashboard

The dashboard refreshes automatically — stakeholders receive updated
dashboards in their inbox without logging into Databricks.

Query History and Performance

Every query executed on a SQL Warehouse is logged in Query History:

Access: SQL Warehouses → select warehouse → Query History tab

Each entry shows:
  - Query text (full SQL)
  - User who ran it
  - Start time and duration
  - Rows returned
  - Bytes scanned
  - Status (Succeeded, Failed, Cancelled)

Use this to:
  1. Find slow queries (sort by duration)
  2. Identify expensive queries (sort by bytes scanned)
  3. Audit who queried what (compliance)
  4. Optimize: rewrite slow queries, add indexes, partition tables

Query Profile (Execution Plan)

Click any query in history → Query Profile to see the execution plan. This shows exactly how Databricks executed your query — which tables were scanned, how joins were performed, where time was spent. Look for full table scans (missing filters), sort-merge joins on small tables (should be broadcast), and skewed partitions.

Alerts (Automated Notifications)

Alerts trigger notifications when a query result meets a condition:

Setup:
  1. Write a query that returns a single value:
     SELECT COUNT(*) as failed_rows FROM silver.data_quality_log
     WHERE status = 'FAILED' AND check_date = CURRENT_DATE
  2. Create Alert: SQL Editor → Alerts → "Create Alert"
  3. Condition: "Value is greater than 0"
  4. Frequency: Every 1 hour
  5. Notification: Email or Slack webhook

Use cases:
  - Data quality: alert when failed row count > 0
  - Pipeline monitoring: alert when no new data in 24 hours
  - Business KPI: alert when daily revenue drops below threshold
  - Capacity: alert when table size exceeds storage limit

Access Control for SQL Warehouses

Permission What It Allows Typical User
CAN USE Run queries on the warehouse Analysts, BI developers
CAN MONITOR View query history and metrics Team leads, admins
CAN MANAGE Start/stop, resize, configure Platform team, workspace admins
IS OWNER Full control including delete Workspace admin

Data access is controlled separately via Unity Catalog — even if a user can USE the warehouse, they can only query tables they have SELECT permission on.

Databricks SQL vs Notebooks

Use Case Databricks SQL Notebooks
Ad-hoc data exploration ✅ Best (SQL editor, instant results) OK (heavier setup)
Building dashboards ✅ Best (native dashboard builder) ❌ No native dashboards
ETL / data transformations ❌ Limited (SQL only) ✅ Best (PySpark, complex logic)
Machine learning ❌ Not supported ✅ Best (MLflow, sklearn, etc.)
Sharing results with business ✅ Best (shareable dashboards) Moderate (export/screenshot)
Scheduled data loads Moderate (SQL-only transforms) ✅ Best (full Python + PySpark)

Databricks SQL vs Fabric Warehouse

Feature Databricks SQL Fabric Warehouse
SQL dialect Spark SQL (ANSI-compatible) T-SQL (SQL Server compatible)
Storage Delta Lake on cloud storage OneLake (Delta format)
Compute SQL Warehouses (Photon engine) Serverless SQL (always-on)
Dashboards Built-in dashboard builder Power BI (separate service)
Governance Unity Catalog Purview + workspace roles
Best for Multi-cloud, Spark-centric orgs Microsoft-centric orgs, Power BI users

Common Mistakes

  1. Not setting auto-stop — a SQL Warehouse left running 24/7 costs thousands per month. Set auto-stop to 10-15 minutes.
  2. Using all-purpose clusters for SQL analytics — SQL Warehouses are optimized for SQL queries (Photon engine, query caching). All-purpose clusters are for ETL and ML, not ad-hoc queries.
  3. Running full table scans — always filter by partitioned columns (date, region) to reduce bytes scanned and query cost.
  4. Not using query caching — SQL Warehouses cache query results. Running the same query twice is instant the second time. Do not disable caching unless you need real-time results.
  5. Over-sizing the warehouse for dev — a 2X-Small warehouse is sufficient for development and testing. Use Medium/Large only for production dashboards with concurrent users.

Interview Questions

Q: What is the difference between a SQL Warehouse and an all-purpose cluster? A: SQL Warehouses are optimized for SQL-only analytics workloads with features like Photon acceleration, query caching, and auto-scaling. All-purpose clusters support Python, Scala, R, and SQL, and are used for ETL, data engineering, and ML. SQL Warehouses auto-start on query and auto-stop when idle. All-purpose clusters require manual management.

Q: What is the difference between Serverless and Pro SQL Warehouses? A: Serverless warehouses start in seconds, are fully managed by Databricks, and have no idle cost — you pay only when queries run. Pro warehouses take 2-5 minutes to start, run in your cloud account, and cost less per DBU but you pay for idle time. Use Serverless for unpredictable workloads and Pro for predictable, cost-sensitive production use.

Q: How do alerts work in Databricks SQL? A: You write a query that returns a single value (e.g., count of failed rows). You create an alert with a condition (e.g., value > 0), a check frequency (e.g., hourly), and a notification channel (email or Slack). When the condition is met, Databricks sends the alert automatically. Use for data quality monitoring, SLA checks, and business KPI thresholds.

Wrapping Up

Databricks SQL democratizes data access. Data engineers build the Lakehouse with PySpark notebooks. Analysts and business users consume it through Databricks SQL — writing queries, building dashboards, and setting alerts — all without needing Python skills. The SQL Warehouse provides the compute, Unity Catalog provides the governance, and the dashboard builder provides the visualization.

Related posts:Databricks IntroductionUnity Catalog Deep DiveDelta Lake Deep DiveFabric Lakehouse vs Warehouse


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link