Azure SQL Database for Data Engineers: Setup, Configuration, and Pipeline Integration
Azure SQL Database is the relational database you will use most often in Azure data engineering projects. It is where your metadata tables, audit logs, config tables, and structured data live. Every pipeline we have built in this blog series reads from or writes to Azure SQL Database.
Yet many data engineers treat it as a black box — they know how to write queries but cannot explain the difference between DTUs and vCores, do not understand service tiers, and struggle with authentication and firewall configuration.
This post gives you the complete picture of Azure SQL Database from a data engineering perspective — not DBA-level depth, but everything you need to create databases, configure access, optimize performance, and integrate with your pipelines.
Table of Contents
- What Is Azure SQL Database?
- Azure SQL Database vs SQL Server vs Managed Instance
- Creating an Azure SQL Database
- Service Tiers and Compute Models
- Authentication: SQL Auth vs Azure AD
- Firewall Rules and Network Access
- Connecting from Different Tools
- Key SQL Server Features Data Engineers Use
- Integrating with ADF and Synapse Pipelines
- Performance Monitoring Basics
- Cost Optimization
- Backup and Recovery
- Interview Questions
- Wrapping Up
What Is Azure SQL Database?
Azure SQL Database is a fully managed relational database service built on SQL Server engine. Microsoft handles patching, backups, high availability, and infrastructure. You focus on your data and queries.
Key characteristics:
- Fully managed — no OS, no patching, no infrastructure
- Built-in high availability — 99.99% SLA with automatic failover
- Automatic backups — point-in-time restore up to 35 days
- Elastic scaling — scale up/down without downtime
- SQL Server compatible — same T-SQL syntax, same tools (SSMS, Azure Data Studio)
- Pay per use — choose between DTU and vCore pricing models
Azure SQL Database vs SQL Server vs Managed Instance
| Feature | Azure SQL Database | Azure SQL Managed Instance | SQL Server (on-prem/VM) |
|---|---|---|---|
| Management | Fully managed (PaaS) | Fully managed (PaaS) | You manage everything |
| SQL Server features | Most features | Near 100% compatibility | Full features |
| Cross-database queries | Not supported | Supported | Supported |
| SQL Agent jobs | Not available | Available | Available |
| Linked servers | Not supported | Supported | Supported |
| CLR integration | Limited | Supported | Full support |
| Cost | Lowest | Higher | Depends on licensing |
| Best for | Single databases, new projects | Lift-and-shift from on-prem | Full control, legacy apps |
For data engineering: Azure SQL Database is the best choice for new projects. Use Managed Instance only if you need cross-database queries, SQL Agent, or near-100% SQL Server compatibility for migration.
Creating an Azure SQL Database
Step 1: Create a SQL Server (Logical Server)
The “server” in Azure SQL is a logical container — not a physical machine. It holds databases, firewall rules, and login credentials.
- Azure Portal > Create a resource > SQL Server (logical server)
- Configure:
- Server name:
sql-dataplatform-dev(globally unique, becomessql-dataplatform-dev.database.windows.net) - Region: same as your ADF/Synapse workspace
- Authentication: SQL authentication (set admin username and password) or Azure AD only
- Click Review + Create
Step 2: Create the Database
- Azure Portal > Create a resource > SQL Database
- Configure:
- Database name:
AdventureWorksLT(or your project name) - Server: select the server you created
- Want to use SQL elastic pool: No (for single databases)
- Compute + storage: click Configure database
- For dev/test: Basic (5 DTUs, $5/month) or Serverless (auto-pause)
- For production: General Purpose vCore
- Under Additional settings:
- Use existing data: Sample (loads AdventureWorksLT sample data — great for learning)
- Click Review + Create
Step 3: Configure Firewall
By default, no external connections are allowed. You must add firewall rules:
- Go to your SQL Server > Networking
- Under Firewall rules:
- Click Add your client IPv4 address (adds your current IP)
- Toggle Allow Azure services and resources to access this server to Yes (required for ADF/Synapse)
- Click Save
Service Tiers and Compute Models
DTU Model (Simpler)
DTU (Database Transaction Unit) bundles CPU, memory, and I/O into a single metric:
| Tier | DTUs | Storage | Cost/Month | Use Case |
|---|---|---|---|---|
| Basic | 5 | 2 GB | ~$5 | Dev/test, tiny workloads |
| Standard S0 | 10 | 250 GB | ~$15 | Light production |
| Standard S3 | 100 | 250 GB | ~$150 | Medium workloads |
| Premium P1 | 125 | 500 GB | ~$465 | High-performance |
vCore Model (More Control)
You choose CPU cores and storage independently:
| Tier | Description | Use Case |
|---|---|---|
| General Purpose | Balanced compute and storage | Most production workloads |
| Business Critical | High I/O, built-in read replicas | Mission-critical, low-latency |
| Hyperscale | Auto-scaling storage up to 100 TB | Very large databases |
Serverless Compute (Cost Saver)
The database auto-pauses when not in use and auto-resumes on the first connection:
- Min vCores: 0.5 (auto-pause when idle)
- Max vCores: up to 80
- Auto-pause delay: 1 hour (configurable)
- You pay only for compute used + storage
Perfect for: Dev/test databases, intermittent workloads, metadata databases that are only accessed during pipeline runs.
Authentication: SQL Auth vs Azure AD
SQL Authentication (Simple)
Traditional username/password:
Server: sql-dataplatform-dev.database.windows.net
Database: AdventureWorksLT
Username: sqladmin
Password: MyP@ssw0rd123
Pros: Simple to set up, works from anywhere. Cons: Password management, no MFA, credentials in linked services.
Azure AD (Entra ID) Authentication (Recommended)
Uses Azure Active Directory identities:
-- Create an Azure AD user in the database
CREATE USER [naveen@company.com] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [naveen@company.com];
ALTER ROLE db_datawriter ADD MEMBER [naveen@company.com];
Pros: No passwords to manage, MFA support, Managed Identity support. Cons: Requires Azure AD setup, slightly more complex initial configuration.
Managed Identity (Best for Pipelines)
ADF/Synapse uses its managed identity to connect — no credentials needed:
- Set ADF managed identity as Azure AD admin on the SQL server, OR
- Create the managed identity as a database user:
CREATE USER [naveen-datafactory-de] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [naveen-datafactory-de];
ALTER ROLE db_datawriter ADD MEMBER [naveen-datafactory-de];
ALTER ROLE db_ddladmin ADD MEMBER [naveen-datafactory-de];
In the ADF linked service, select Managed Identity as the authentication type. No password, no key rotation, no secrets.
Firewall Rules and Network Access
Public Access (Default)
The database is accessible over the internet. Firewall rules control which IPs can connect:
Server firewall rules:
- My office IP: 203.0.113.50
- My home IP: 198.51.100.25
- Allow Azure services: Yes (required for ADF)
Private Access (Production)
Use Private Endpoints to make the database accessible only from your Virtual Network:
- Go to SQL Server > Networking > Private endpoint connections
- Create a private endpoint in your VNet
- The database gets a private IP address (e.g., 10.0.1.5)
- Disable public access
ADF/Synapse connects via Managed VNet with a managed private endpoint.
Connecting from Different Tools
SQL Server Management Studio (SSMS)
Server: sql-dataplatform-dev.database.windows.net
Authentication: SQL Server Authentication (or Azure AD)
Login: sqladmin
Password: ****
Azure Data Studio
Same connection details as SSMS. Lighter weight, cross-platform, supports notebooks.
Python (SQLAlchemy + pyodbc)
from sqlalchemy import create_engine
engine = create_engine(
'mssql+pyodbc://sqladmin:password@sql-dataplatform-dev.database.windows.net/AdventureWorksLT'
'?driver=ODBC+Driver+18+for+SQL+Server'
'&encrypt=yes'
'&TrustServerCertificate=no'
)
ADF/Synapse Linked Service
- Manage > Linked services > + New > Azure SQL Database
- Server:
sql-dataplatform-dev.database.windows.net - Database:
AdventureWorksLT - Authentication: Managed Identity (recommended) or SQL auth
- Test connection > Create
Key SQL Server Features Data Engineers Use
Stored Procedures
Encapsulate reusable SQL logic:
CREATE PROCEDURE sp_insert_audit_log
@rows_read INT,
@table_name VARCHAR(100),
@error_message VARCHAR(500)
AS
BEGIN
INSERT INTO audit (rows_read, table_name, error_message, load_date)
VALUES (@rows_read, @table_name, @error_message, GETDATE());
END;
Called from ADF via the Stored Procedure activity.
Views
Create reusable query abstractions:
CREATE VIEW v_active_customers AS
SELECT customer_id, first_name, last_name, email
FROM customers
WHERE is_active = 1;
Indexes
Speed up queries on large tables:
-- Index on the column used in WHERE clauses
CREATE INDEX IX_orders_customer_id ON orders (customer_id);
-- Composite index for multi-column filters
CREATE INDEX IX_orders_date_customer ON orders (order_date, customer_id);
-- Index on the watermark column for incremental loads
CREATE INDEX IX_employees_empid ON employees (empid);
Temporal Tables (System-Versioned)
Automatically track row changes over time:
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
valid_from DATETIME2 GENERATED ALWAYS AS ROW START,
valid_to DATETIME2 GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH (SYSTEM_VERSIONING = ON);
Query historical data:
-- What did the customer record look like on March 1?
SELECT * FROM customers
FOR SYSTEM_TIME AS OF '2026-03-01';
Dynamic Data Masking
Hide sensitive data from non-privileged users:
ALTER TABLE customers
ALTER COLUMN email ADD MASKED WITH (FUNCTION = 'email()');
ALTER TABLE customers
ALTER COLUMN phone ADD MASKED WITH (FUNCTION = 'partial(0,"XXX-XXX-",4)');
Non-admin users see: aXXX@XXXX.com and XXX-XXX-1234.
Integrating with ADF and Synapse Pipelines
As a Source (Reading Data)
- Lookup activity: Read metadata/config tables
- Copy activity source: Copy table data to ADLS Gen2
- Stored Procedure: Execute queries and return results
As a Sink (Writing Data)
- Copy activity sink: Write data from other sources into SQL tables
- Stored Procedure: Insert audit logs, update watermarks
- Script activity: Run DDL or DML statements
The Metadata + Audit Pattern
The pattern we have used throughout this blog series:
Azure SQL Database stores:
- metadata table (which tables to copy)
- configtable (watermarks for incremental load)
- audit table (pipeline execution logs)
ADF/Synapse reads metadata, copies data to ADLS Gen2,
and writes audit logs back to Azure SQL.
Performance Monitoring Basics
Query Performance Insight
Built into the Azure Portal:
- Go to your database > Query Performance Insight
- See top resource-consuming queries
- Identify slow queries and optimization opportunities
Key DMVs (Dynamic Management Views)
-- Top 10 queries by CPU usage
SELECT TOP 10
qs.total_worker_time / qs.execution_count as avg_cpu,
qs.execution_count,
SUBSTRING(st.text, 1, 200) as query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
ORDER BY avg_cpu DESC;
-- Current connections
SELECT COUNT(*) as connection_count
FROM sys.dm_exec_connections;
-- Database size
SELECT
SUM(size * 8 / 1024) as size_mb
FROM sys.database_files;
Automatic Tuning
Azure SQL Database automatically: – Creates missing indexes – Drops unused indexes – Forces good query plans (plan regression fix)
Enable in Database > Automatic tuning > turn on all options.
Cost Optimization
- Use Serverless for dev/test (auto-pauses when idle)
- Right-size DTUs/vCores — most dev databases are over-provisioned
- Reserved capacity for production (save up to 65% with 3-year reservation)
- Elastic pools if you have multiple databases with variable usage
- Short-term retention — default backup retention is 7 days. Only increase if needed.
- Scale down during off-hours using Azure Automation or Logic Apps
Backup and Recovery
Azure SQL Database handles backups automatically:
| Feature | Details |
|---|---|
| Full backups | Weekly |
| Differential backups | Every 12-24 hours |
| Transaction log backups | Every 5-10 minutes |
| Retention | 7-35 days (configurable) |
| Point-in-time restore | Any second within retention period |
| Geo-redundant backup | Optional (copies to paired region) |
| Long-term retention | Up to 10 years (for compliance) |
Restore to a Point in Time
az sql db restore --dest-name AdventureWorksLT_restored --resource-group rg-dev --server sql-dataplatform-dev --name AdventureWorksLT --time "2026-04-07T10:00:00Z"
This creates a NEW database with data as of the specified time.
Interview Questions
Q: What is Azure SQL Database? A: A fully managed relational database service based on SQL Server. Microsoft handles infrastructure, patching, backups, and high availability. You manage your data and queries.
Q: What is the difference between DTU and vCore pricing? A: DTU bundles CPU, memory, and I/O into a single metric — simpler to understand. vCore lets you choose CPU and storage independently — more control. vCore also supports Serverless (auto-pause) and reserved capacity discounts.
Q: How does ADF connect to Azure SQL Database? A: Through a Linked Service. Authentication can be SQL authentication (username/password), Azure AD, or Managed Identity (recommended — no credentials to manage).
Q: Why would you use Managed Identity instead of SQL authentication? A: No passwords to store, rotate, or risk leaking. ADF authenticates using its Azure-managed identity automatically. More secure and easier to maintain.
Q: What is the Serverless compute tier? A: The database auto-pauses after idle time and auto-resumes on first connection. You pay only for compute used. Perfect for dev/test databases and intermittent workloads.
Wrapping Up
Azure SQL Database is the workhorse behind every Azure data pipeline. Understanding service tiers, authentication, firewall configuration, and pipeline integration gives you the foundation to build production-grade data platforms.
For data engineers, the key skills are: creating databases with the right tier, configuring Managed Identity access for ADF/Synapse, writing stored procedures for audit logging, and understanding the metadata/config/audit pattern.
Related posts: – What is Azure Data Factory? – Metadata-Driven Pipeline in ADF – Audit Logging in Data Pipelines – Azure Fundamentals – Synapse Pipeline with Audit Logging
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.