Connecting Azure Databricks to Blob Storage and ADLS Gen2: Every Method Explained
You just spun up a Databricks workspace, created a cluster, and opened a notebook. Now you try to read a Parquet file from your data lake and get: “No credentials found.” The cluster is running. The storage account exists. The file is there. But Databricks cannot see it.
This is the most common first-day frustration with Databricks. Unlike Synapse, which auto-connects to its default storage, Databricks needs you to explicitly configure HOW it authenticates with Azure Storage. There is no default connection.
Think of it like moving into a new apartment. The building exists, your apartment is ready, but the WiFi is not set up yet. You need to connect to the router (storage account), enter the password (credentials), and THEN you can browse the internet (read files). This post shows you every way to connect.
Table of Contents
- Why Databricks Needs Explicit Storage Configuration
- The Four Connection Methods
- Method 1: Access Key (Simplest — Dev Only)
- Method 2: SAS Token (Scoped Access)
- Method 3: Service Principal with OAuth (Production)
- Method 4: Unity Catalog with Access Connector (Modern Best Practice)
- Mounting vs Direct Access
- Mounting Storage with dbutils
- Reading and Writing Files After Connection
- Connecting to Blob Storage vs ADLS Gen2
- Secure Credential Management with Key Vault
- Setting Up Key Vault Secret Scope
- The Config Notebook Pattern (Production)
- Common Connection Errors and Fixes
- Which Method Should I Use?
- Interview Questions
- Wrapping Up
Why Databricks Needs Explicit Storage Configuration
In Synapse, the workspace is created WITH a default ADLS Gen2 account. A managed identity is auto-configured. Everything just works from day one.
Databricks is different. The workspace is compute-only — it does not own any storage. Your data lives in YOUR storage accounts, and you must tell Databricks how to authenticate.
Synapse: Workspace ──(built-in connection)──→ Default ADLS Gen2
Just works. No setup needed.
Databricks: Workspace ──(???)──→ Your Storage Account
You must configure the connection method and credentials.
Real-life analogy: Synapse is like a company laptop that comes pre-configured with VPN, email, and file shares. Databricks is like a personal laptop you bring to the office — powerful, but you need to set up WiFi, VPN, and file share access yourself before you can access company resources.
The Four Connection Methods
| Method | Security | Scope | Best For | Complexity |
|---|---|---|---|---|
| Access Key | Low (full account access) | Entire storage account | Dev/learning | Simplest |
| SAS Token | Medium (scoped, time-limited) | Specific container/file | Temporary access | Simple |
| Service Principal | High (RBAC-controlled) | Specific containers via role assignment | Production | Medium |
| Unity Catalog + Access Connector | Highest (centralized governance) | Governed via catalog | Enterprise production | Complex (one-time) |
Method 1: Access Key (Simplest — Dev Only)
What It Is
Every Azure Storage account has two access keys — master passwords that grant full read/write/delete access to EVERYTHING in the account. Using an access key in Databricks is like using the building master key to open every door.
How to Get the Access Key
- Go to Azure Portal → your storage account
- Click Access keys (under Security + networking)
- Click Show on Key 1
- Copy the key
Configure in Databricks Notebook
# Set the access key in Spark configuration
storage_account_name = "naveenblobde"
access_key = "your-access-key-here" # DON'T hardcode in production!
spark.conf.set(
f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net",
access_key
)
# Now you can read files
df = spark.read.parquet(
f"abfss://cont1@{storage_account_name}.dfs.core.windows.net/data/"
)
df.show(5)
For Blob Storage (wasbs:// protocol)
spark.conf.set(
f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net",
access_key
)
# Read using wasbs://
df = spark.read.csv(
f"wasbs://cont1@{storage_account_name}.blob.core.windows.net/data.csv",
header=True
)
abfss:// vs wasbs:// — Which to Use?
| Protocol | Storage Type | Namespace | When to Use |
|---|---|---|---|
abfss:// |
ADLS Gen2 (hierarchical namespace ON) | DFS endpoint | Always prefer this for ADLS Gen2 |
wasbs:// |
Blob Storage (hierarchical namespace OFF) | Blob endpoint | Only for regular Blob Storage |
abfss:// |
Can also work on Blob Storage | DFS endpoint | May require additional config |
Rule of thumb: If your storage account has hierarchical namespace enabled (ADLS Gen2), use abfss://. If it is regular Blob Storage, use wasbs://.
Why Access Keys Are Dangerous in Production
- Access keys grant full control over the entire storage account — read, write, delete, management
- If the key leaks (committed to Git, shared in Slack), anyone can access ALL your data
- You cannot restrict access to specific containers or folders
- No audit trail of who used the key
- Revoking the key breaks ALL connections using it
Real-life analogy: An access key is like the building master key. It opens every apartment, the mailroom, the storage room, and the office. If you lose it, everyone in the building is at risk. You would never give the master key to a delivery driver — you would give them a key to the specific apartment they need (Service Principal or SAS).
Method 2: SAS Token (Scoped Access)
What It Is
A Shared Access Signature (SAS) is a time-limited, permission-scoped token that grants access to specific containers or files without exposing the access key.
Generate a SAS Token
- Go to Azure Portal → Storage account → Containers → select container
- Click Shared access tokens (on the container, not the account)
- Configure:
- Permissions: Read, List (minimum needed)
- Start: Now
- Expiry: 24 hours (or your preferred duration)
- Allowed protocols: HTTPS only
- Click Generate SAS token and URL
- Copy the SAS token (starts with
?sv=)
Configure in Databricks
storage_account_name = "naveenblobde"
container_name = "cont1"
sas_token = "?sv=2021-06-08&st=2026-04-20&se=2026-04-21&sr=c&sp=rl&sig=..."
spark.conf.set(
f"fs.azure.sas.{container_name}.{storage_account_name}.blob.core.windows.net",
sas_token
)
# Read files
df = spark.read.parquet(
f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net/data/"
)
SAS Token Advantages
- Time-limited — auto-expires after the set duration
- Scoped — can be restricted to specific containers or files
- Revocable — change the account key and all SAS tokens become invalid
- No full account access — only grants the permissions you specify
SAS Token Limitations
- Token management is manual (must regenerate when expired)
- Not ideal for long-running production pipelines
- Still based on the account key (if key is compromised, SAS can be forged)
Real-life analogy: A SAS token is like a hotel key card. It opens only YOUR room (specific container), only during your stay (time-limited), and only for entering (read permission). It does not open other rooms, and it stops working on checkout day.
Method 3: Service Principal with OAuth (Production)
What It Is
A Service Principal is an Azure AD identity specifically for applications. You create it, assign it specific roles on specific storage containers, and authenticate using its credentials (client ID + secret). This is the production-standard method.
Step 1: Create a Service Principal
# Azure CLI
az ad sp create-for-rbac --name "databricks-storage-sp" --role "Storage Blob Data Contributor" --scopes "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>"
Or via Azure Portal:
1. Azure AD → App registrations → + New registration
2. Name: databricks-storage-sp
3. Click Register
4. Note the Application (client) ID and Directory (tenant) ID
5. Go to Certificates & secrets → + New client secret
6. Copy the secret value (shown only once)
Step 2: Assign Storage Role
- Go to your Storage account → Access Control (IAM)
- Click + Add → Add role assignment
- Role: Storage Blob Data Contributor (or Reader for read-only)
- Members: search for
databricks-storage-sp - Click Review + assign
Step 3: Configure in Databricks
storage_account_name = "naveenadlsgen2de"
client_id = "your-app-client-id"
tenant_id = "your-tenant-id"
client_secret = dbutils.secrets.get("keyvault-scope", "sp-client-secret") # From Key Vault!
# Set Spark configs for OAuth
spark.conf.set(f"fs.azure.account.auth.type.{storage_account_name}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account_name}.dfs.core.windows.net",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account_name}.dfs.core.windows.net", client_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account_name}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account_name}.dfs.core.windows.net",
f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")
# Now read files
df = spark.read.parquet(f"abfss://synapse-workspace@{storage_account_name}.dfs.core.windows.net/sqldb/Customer/")
df.show(5)
Why Service Principal Is Better for Production
- Scoped access — assign roles on specific containers, not the entire account
- RBAC-controlled — use Azure AD roles (Reader, Contributor) for fine-grained permissions
- Audit trail — Azure AD logs every authentication
- Rotatable — rotate the client secret without disrupting other connections
- No master key exposure — the storage account access key is never used
Real-life analogy: A Service Principal is like an employee badge at a company. The badge has the employee’s name (client ID), their department (roles), and an access level (permissions). It only opens the doors they are authorized for. If they leave, you deactivate their badge without affecting anyone else’s access.
Method 4: Unity Catalog with Access Connector (Modern Best Practice)
What It Is
Unity Catalog is Databricks’ centralized governance layer. An Access Connector is a managed identity that Databricks uses to access storage on your behalf — no keys, no secrets, no Service Principal management.
Step 1: Create Access Connector
- Azure Portal → Create a resource → search Access Connector for Azure Databricks
- Name:
databricks-access-connector - Region: same as your workspace
- Click Create
Step 2: Assign Storage Role to Access Connector
- Go to your Storage account → Access Control (IAM)
- + Add → Add role assignment
- Role: Storage Blob Data Contributor
- Members: select Managed Identity → find
databricks-access-connector - Review + assign
Step 3: Create Storage Credential in Unity Catalog
-- In Databricks SQL or notebook
CREATE STORAGE CREDENTIAL access_connector_cred
WITH (AZURE_MANAGED_IDENTITY {
access_connector_id = "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Databricks/accessConnectors/databricks-access-connector"
});
Step 4: Create External Location
CREATE EXTERNAL LOCATION datalake_location
URL 'abfss://synapse-workspace@naveenadlsgen2de.dfs.core.windows.net/'
WITH (STORAGE CREDENTIAL access_connector_cred);
Step 5: Read Data
# No spark.conf.set needed — Unity Catalog handles everything
df = spark.read.parquet("abfss://synapse-workspace@naveenadlsgen2de.dfs.core.windows.net/sqldb/Customer/")
df.show(5)
Why Unity Catalog Is the Modern Best Practice
- Zero secrets — no keys, no passwords, no client secrets to manage
- Centralized governance — permissions managed in one place (Unity Catalog)
- Auditable — full lineage and access logs
- Sharable — tables created from external locations are governed by Unity Catalog policies
- Cross-workspace — multiple workspaces can share the same catalog and permissions
Real-life analogy: Unity Catalog with Access Connector is like a biometric building entry system. No keys, no badges, no passwords. You walk up, the system scans your fingerprint (managed identity), checks your access level (catalog permissions), and opens the door. If you are transferred to another department, your access updates automatically. No physical keys to manage or lose.
Mounting vs Direct Access
Mounting (dbutils.fs.mount)
Creates a persistent shortcut from a DBFS path to your storage:
dbutils.fs.mount(
source=f"abfss://synapse-workspace@naveenadlsgen2de.dfs.core.windows.net/",
mount_point="/mnt/datalake",
extra_configs={
f"fs.azure.account.key.naveenadlsgen2de.dfs.core.windows.net":
dbutils.secrets.get("keyvault-scope", "adls-storage-key")
}
)
# Now use the short path
df = spark.read.parquet("/mnt/datalake/sqldb/Customer/")
Direct Access (spark.conf.set)
Configures credentials per session without creating a persistent mount:
spark.conf.set(
f"fs.azure.account.key.naveenadlsgen2de.dfs.core.windows.net",
dbutils.secrets.get("keyvault-scope", "adls-storage-key")
)
# Use full path every time
df = spark.read.parquet("abfss://synapse-workspace@naveenadlsgen2de.dfs.core.windows.net/sqldb/Customer/")
Comparison
| Feature | Mounting | Direct Access | Unity Catalog |
|---|---|---|---|
| Persistence | Survives cluster restart | Session only | Permanent (catalog) |
| Path | Short (/mnt/datalake/) |
Full ABFSS path | Full ABFSS path |
| Setup | Once per mount | Every notebook/session | Once per external location |
| Modern? | Legacy (but widely used) | Common | Best practice |
| Unity Catalog compatible | No | Partial | Full |
Recommendation: For new projects, use Unity Catalog external locations. For existing projects without Unity Catalog, use direct access with Key Vault secrets. Mounting still works but is considered legacy.
Secure Credential Management with Key Vault
Never Hardcode Credentials
# ❌ NEVER do this
access_key = "xYz123AbC456..." # Visible in notebook, committed to Git
# ✅ ALWAYS do this
access_key = dbutils.secrets.get("keyvault-scope", "adls-storage-key") # [REDACTED] in output
Setting Up Key Vault Secret Scope
- Store your storage key/secret in Azure Key Vault:
- Go to Key Vault → Secrets → + Generate/Import
- Name:
adls-storage-key - Value: paste the access key
-
Click Create
-
Create the secret scope in Databricks:
- Navigate to
https://<your-workspace-url>#secrets/createScope - Scope Name:
keyvault-scope - DNS Name: your Key Vault URL (e.g.,
https://kv-dataplatform-dev.vault.azure.net/) - Resource ID: Key Vault full resource ID
-
Click Create
-
Grant Databricks access to Key Vault:
- Go to Key Vault → Access Control (IAM)
- Add role: Key Vault Secrets User
- Member: search for
AzureDatabricks(App ID:2ff814a6-3304-4ab8-85cb-cd0e6f879c1d) -
Review + assign
-
Use in notebooks:
storage_key = dbutils.secrets.get("keyvault-scope", "adls-storage-key")
spark.conf.set(
f"fs.azure.account.key.naveenadlsgen2de.dfs.core.windows.net",
storage_key
)
print("Connected securely!")
The Config Notebook Pattern (Production)
In production, create a reusable config notebook that sets up all storage connections:
# Notebook: /Config/Storage_Config
storage_account = "naveenadlsgen2de"
scope = "keyvault-scope"
# Get credentials from Key Vault
storage_key = dbutils.secrets.get(scope, "adls-storage-key")
# Configure access
spark.conf.set(
f"fs.azure.account.key.{storage_account}.dfs.core.windows.net",
storage_key
)
# Define paths as variables
BRONZE_PATH = f"abfss://synapse-workspace@{storage_account}.dfs.core.windows.net/bronze/"
SILVER_PATH = f"abfss://synapse-workspace@{storage_account}.dfs.core.windows.net/silver/"
GOLD_PATH = f"abfss://synapse-workspace@{storage_account}.dfs.core.windows.net/gold/"
print(f"Storage configured for: {storage_account}")
Then in every ETL notebook:
# Cell 1: Run the config notebook
%run /Config/Storage_Config
# Cell 2: Use the pre-configured paths
df = spark.read.parquet(f"{BRONZE_PATH}customers/")
df_clean = df.filter(df.status == "Active")
df_clean.write.format("delta").mode("overwrite").save(f"{SILVER_PATH}customers/")
Why this pattern works: – Credentials are configured in ONE place – Path constants are reusable across all notebooks – Changing the storage account updates everywhere – The config notebook is tested and trusted
Real-life analogy: The config notebook is like a Wi-Fi router configuration. You set up the password once on the router, and every device in the house connects through it. You do not type the Wi-Fi password on every device separately — the router handles it.
Common Connection Errors and Fixes
| Error | Cause | Fix |
|---|---|---|
| “No credentials found for account” | spark.conf.set not called or wrong account name |
Verify account name matches exactly (case-sensitive) |
| “403 Forbidden” | Storage firewall blocking Databricks IP | Add Databricks subnet to storage firewall, or enable “Allow trusted Azure services” |
| “Container does not exist” | Wrong container name or using wrong protocol | Verify container name. Use abfss:// for ADLS Gen2, wasbs:// for Blob |
| “Authentication failed” | Wrong access key or expired SAS token | Regenerate key/token from Azure Portal |
| “Secret does not exist” | Wrong scope or secret name in dbutils.secrets.get |
List secrets: dbutils.secrets.list("keyvault-scope") to verify |
| “KeyVault access denied” | Databricks app not granted Key Vault Secrets User role | Add AzureDatabricks (ID: 2ff814a6...) as Key Vault Secrets User |
| “Hierarchical namespace not enabled” | Using abfss:// on regular Blob Storage |
Use wasbs:// instead, or enable hierarchical namespace on the account |
| “Mount point already exists” | Trying to mount to a path that is already mounted | dbutils.fs.unmount("/mnt/datalake") first, then remount |
Which Method Should I Use?
Learning / Quick Testing?
→ Method 1: Access Key (simplest, set up in 30 seconds)
Temporary / Shared access to specific files?
→ Method 2: SAS Token (scoped, time-limited)
Production without Unity Catalog?
→ Method 3: Service Principal + Key Vault (RBAC, auditable, rotatable)
Enterprise / New Databricks projects?
→ Method 4: Unity Catalog + Access Connector (zero secrets, governed)
The evolution path: 1. Start with Access Key to learn and test 2. Move to Service Principal when building production pipelines 3. Migrate to Unity Catalog when the organization standardizes on Databricks governance
Interview Questions
Q: How do you connect Azure Databricks to ADLS Gen2? A: Four methods: Access Key (simplest, dev only), SAS Token (scoped, temporary), Service Principal with OAuth (production — RBAC controlled), and Unity Catalog with Access Connector (enterprise — zero secrets). For production, use Service Principal with credentials stored in Azure Key Vault via a Databricks secret scope. For enterprise, use Unity Catalog.
Q: What is the difference between abfss:// and wasbs:// protocols?
A: abfss:// uses the DFS (Data Lake Storage) endpoint for ADLS Gen2 accounts with hierarchical namespace enabled. wasbs:// uses the Blob endpoint for regular Blob Storage accounts. Always use abfss:// for ADLS Gen2 — it supports directory-level operations and is optimized for big data workloads.
Q: Why should you never hardcode storage keys in notebooks?
A: Notebooks are committed to Git, shared with colleagues, and visible in version history. Hardcoded keys in notebooks are exposed to anyone with repository access. Use dbutils.secrets.get() to read credentials from Azure Key Vault — values are automatically redacted in notebook output.
Q: What is the config notebook pattern?
A: A dedicated notebook that configures storage credentials and defines path constants. Other notebooks call %run /Config/Storage_Config to inherit the configuration. This centralizes credential management, makes paths reusable, and ensures consistent setup across all ETL notebooks.
Q: What is mounting and when should you use it?
A: Mounting creates a persistent DBFS shortcut to an Azure Storage path. After mounting, you use short paths like /mnt/datalake/ instead of full ABFSS URLs. It survives cluster restarts but is considered legacy. Modern best practice is Unity Catalog external locations or direct spark.conf.set with Key Vault secrets.
Wrapping Up
Connecting Databricks to Azure Storage is the first thing you do and the most important thing to get right. The wrong method (hardcoded access keys) creates security risks. The right method (Service Principal + Key Vault or Unity Catalog) keeps your data secure and your pipelines production-ready.
Start with access keys for learning. Move to Service Principal for production. Aspire to Unity Catalog for enterprise governance. And ALWAYS use Key Vault — never hardcode credentials.
Related posts: – Azure Databricks Introduction and dbutils – Azure Blob Storage Guide – ADLS Gen2 Complete Guide – Azure Networking (Private Endpoints) – Apache Spark and PySpark
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.