Connecting Azure Databricks to Blob Storage and ADLS Gen2: Every Method Explained

You just spun up a Databricks workspace, created a cluster, and opened a notebook. Now you try to read a Parquet file from your data lake and get: “No credentials found.” The cluster is running. The storage account exists. The file is there. But Databricks cannot see it.

This is the most common first-day frustration with Databricks. Unlike Synapse, which auto-connects to its default storage, Databricks needs you to explicitly configure HOW it authenticates with Azure Storage. There is no default connection.

Think of it like moving into a new apartment. The building exists, your apartment is ready, but the WiFi is not set up yet. You need to connect to the router (storage account), enter the password (credentials), and THEN you can browse the internet (read files). This post shows you every way to connect.

Why Databricks Needs Explicit Storage Configuration
The Four Connection Methods
Method 1: Access Key (Simplest — Dev Only)
Method 2: SAS Token (Scoped Access)
Method 3: Service Principal with OAuth (Production)
Method 4: Unity Catalog with Access Connector (Modern Best Practice)
Mounting vs Direct Access
Mounting Storage with dbutils
Reading and Writing Files After Connection
Connecting to Blob Storage vs ADLS Gen2
Secure Credential Management with Key Vault
Setting Up Key Vault Secret Scope
The Config Notebook Pattern (Production)
Common Connection Errors and Fixes
Which Method Should I Use?
Interview Questions
Wrapping Up

Why Databricks Needs Explicit Storage Configuration

In Synapse, the workspace is created WITH a default ADLS Gen2 account. A managed identity is auto-configured. Everything just works from day one.

Databricks is different. The workspace is compute-only — it does not own any storage. Your data lives in YOUR storage accounts, and you must tell Databricks how to authenticate.

Synapse: Workspace ──(built-in connection)──→ Default ADLS Gen2
         Just works. No setup needed.

Databricks: Workspace ──(???)──→ Your Storage Account
            You must configure the connection method and credentials.

Real-life analogy: Synapse is like a company laptop that comes pre-configured with VPN, email, and file shares. Databricks is like a personal laptop you bring to the office — powerful, but you need to set up WiFi, VPN, and file share access yourself before you can access company resources.

The Four Connection Methods

Method	Security	Scope	Best For	Complexity
Access Key	Low (full account access)	Entire storage account	Dev/learning	Simplest
SAS Token	Medium (scoped, time-limited)	Specific container/file	Temporary access	Simple
Service Principal	High (RBAC-controlled)	Specific containers via role assignment	Production	Medium
Unity Catalog + Access Connector	Highest (centralized governance)	Governed via catalog	Enterprise production	Complex (one-time)

Method 1: Access Key (Simplest — Dev Only)

What It Is

Every Azure Storage account has two access keys — master passwords that grant full read/write/delete access to EVERYTHING in the account. Using an access key in Databricks is like using the building master key to open every door.

How to Get the Access Key

Go to Azure Portal → your storage account
Click Access keys (under Security + networking)
Click Show on Key 1
Copy the key

Configure in Databricks Notebook

# Set the access key in Spark configuration
storage_account_name = "naveenblobde"
access_key = "your-access-key-here"  # DON'T hardcode in production!

spark.conf.set(
    f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net",
    access_key
)

# Now you can read files
df = spark.read.parquet(
    f"abfss://cont1@{storage_account_name}.dfs.core.windows.net/data/"
)
df.show(5)

For Blob Storage (wasbs:// protocol)

spark.conf.set(
    f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net",
    access_key
)

# Read using wasbs://
df = spark.read.csv(
    f"wasbs://cont1@{storage_account_name}.blob.core.windows.net/data.csv",
    header=True
)

abfss:// vs wasbs:// — Which to Use?

Protocol	Storage Type	Namespace	When to Use
`abfss://`	ADLS Gen2 (hierarchical namespace ON)	DFS endpoint	Always prefer this for ADLS Gen2
`wasbs://`	Blob Storage (hierarchical namespace OFF)	Blob endpoint	Only for regular Blob Storage
`abfss://`	Can also work on Blob Storage	DFS endpoint	May require additional config

Rule of thumb: If your storage account has hierarchical namespace enabled (ADLS Gen2), use abfss://. If it is regular Blob Storage, use wasbs://.

Why Access Keys Are Dangerous in Production

Access keys grant full control over the entire storage account — read, write, delete, management
If the key leaks (committed to Git, shared in Slack), anyone can access ALL your data
You cannot restrict access to specific containers or folders
No audit trail of who used the key
Revoking the key breaks ALL connections using it

Real-life analogy: An access key is like the building master key. It opens every apartment, the mailroom, the storage room, and the office. If you lose it, everyone in the building is at risk. You would never give the master key to a delivery driver — you would give them a key to the specific apartment they need (Service Principal or SAS).

Method 2: SAS Token (Scoped Access)

What It Is

A Shared Access Signature (SAS) is a time-limited, permission-scoped token that grants access to specific containers or files without exposing the access key.

Generate a SAS Token

Go to Azure Portal → Storage account → Containers → select container
Click Shared access tokens (on the container, not the account)
Configure:
Permissions: Read, List (minimum needed)
Start: Now
Expiry: 24 hours (or your preferred duration)
Allowed protocols: HTTPS only
Click Generate SAS token and URL
Copy the SAS token (starts with ?sv=)

Configure in Databricks

storage_account_name = "naveenblobde"
container_name = "cont1"
sas_token = "?sv=2021-06-08&st=2026-04-20&se=2026-04-21&sr=c&sp=rl&sig=..."

spark.conf.set(
    f"fs.azure.sas.{container_name}.{storage_account_name}.blob.core.windows.net",
    sas_token
)

# Read files
df = spark.read.parquet(
    f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net/data/"
)

SAS Token Advantages

Time-limited — auto-expires after the set duration
Scoped — can be restricted to specific containers or files
Revocable — change the account key and all SAS tokens become invalid
No full account access — only grants the permissions you specify

SAS Token Limitations

Token management is manual (must regenerate when expired)
Not ideal for long-running production pipelines
Still based on the account key (if key is compromised, SAS can be forged)

Real-life analogy: A SAS token is like a hotel key card. It opens only YOUR room (specific container), only during your stay (time-limited), and only for entering (read permission). It does not open other rooms, and it stops working on checkout day.

Method 3: Service Principal with OAuth (Production)

What It Is

A Service Principal is an Azure AD identity specifically for applications. You create it, assign it specific roles on specific storage containers, and authenticate using its credentials (client ID + secret). This is the production-standard method.

Step 1: Create a Service Principal

# Azure CLI
az ad sp create-for-rbac --name "databricks-storage-sp" --role "Storage Blob Data Contributor"     --scopes "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>"

Or via Azure Portal: 1. Azure AD → App registrations → + New registration 2. Name: databricks-storage-sp 3. Click Register 4. Note the Application (client) ID and Directory (tenant) ID 5. Go to Certificates & secrets → + New client secret 6. Copy the secret value (shown only once)

Step 2: Assign Storage Role

Go to your Storage account → Access Control (IAM)
Click + Add → Add role assignment
Role: Storage Blob Data Contributor (or Reader for read-only)
Members: search for databricks-storage-sp
Click Review + assign

Step 3: Configure in Databricks

storage_account_name = "naveenadlsgen2de"
client_id = "your-app-client-id"
tenant_id = "your-tenant-id"
client_secret = dbutils.secrets.get("keyvault-scope", "sp-client-secret")  # From Key Vault!

# Set Spark configs for OAuth
spark.conf.set(f"fs.azure.account.auth.type.{storage_account_name}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account_name}.dfs.core.windows.net",
               "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account_name}.dfs.core.windows.net", client_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account_name}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account_name}.dfs.core.windows.net",
               f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")

# Now read files
df = spark.read.parquet(f"abfss://synapse-workspace@{storage_account_name}.dfs.core.windows.net/sqldb/Customer/")
df.show(5)

Why Service Principal Is Better for Production

Scoped access — assign roles on specific containers, not the entire account
RBAC-controlled — use Azure AD roles (Reader, Contributor) for fine-grained permissions
Audit trail — Azure AD logs every authentication
Rotatable — rotate the client secret without disrupting other connections
No master key exposure — the storage account access key is never used

Real-life analogy: A Service Principal is like an employee badge at a company. The badge has the employee’s name (client ID), their department (roles), and an access level (permissions). It only opens the doors they are authorized for. If they leave, you deactivate their badge without affecting anyone else’s access.

Method 4: Unity Catalog with Access Connector (Modern Best Practice)

What It Is

Unity Catalog is Databricks’ centralized governance layer. An Access Connector is a managed identity that Databricks uses to access storage on your behalf — no keys, no secrets, no Service Principal management.

Step 1: Create Access Connector

Azure Portal → Create a resource → search Access Connector for Azure Databricks
Name: databricks-access-connector
Region: same as your workspace
Click Create

Step 2: Assign Storage Role to Access Connector

Go to your Storage account → Access Control (IAM)
+ Add → Add role assignment
Role: Storage Blob Data Contributor
Members: select Managed Identity → find databricks-access-connector
Review + assign

Step 3: Create Storage Credential in Unity Catalog

-- In Databricks SQL or notebook
CREATE STORAGE CREDENTIAL access_connector_cred
WITH (AZURE_MANAGED_IDENTITY {
    access_connector_id = "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Databricks/accessConnectors/databricks-access-connector"
});

Step 4: Create External Location

CREATE EXTERNAL LOCATION datalake_location
URL 'abfss://synapse-workspace@naveenadlsgen2de.dfs.core.windows.net/'
WITH (STORAGE CREDENTIAL access_connector_cred);

Step 5: Read Data

# No spark.conf.set needed — Unity Catalog handles everything
df = spark.read.parquet("abfss://synapse-workspace@naveenadlsgen2de.dfs.core.windows.net/sqldb/Customer/")
df.show(5)

Why Unity Catalog Is the Modern Best Practice

Zero secrets — no keys, no passwords, no client secrets to manage
Centralized governance — permissions managed in one place (Unity Catalog)
Auditable — full lineage and access logs
Sharable — tables created from external locations are governed by Unity Catalog policies
Cross-workspace — multiple workspaces can share the same catalog and permissions

Real-life analogy: Unity Catalog with Access Connector is like a biometric building entry system. No keys, no badges, no passwords. You walk up, the system scans your fingerprint (managed identity), checks your access level (catalog permissions), and opens the door. If you are transferred to another department, your access updates automatically. No physical keys to manage or lose.

Mounting vs Direct Access

Mounting (dbutils.fs.mount)

Creates a persistent shortcut from a DBFS path to your storage:

dbutils.fs.mount(
    source=f"abfss://synapse-workspace@naveenadlsgen2de.dfs.core.windows.net/",
    mount_point="/mnt/datalake",
    extra_configs={
        f"fs.azure.account.key.naveenadlsgen2de.dfs.core.windows.net":
            dbutils.secrets.get("keyvault-scope", "adls-storage-key")
    }
)

# Now use the short path
df = spark.read.parquet("/mnt/datalake/sqldb/Customer/")

Direct Access (spark.conf.set)

Configures credentials per session without creating a persistent mount:

spark.conf.set(
    f"fs.azure.account.key.naveenadlsgen2de.dfs.core.windows.net",
    dbutils.secrets.get("keyvault-scope", "adls-storage-key")
)

# Use full path every time
df = spark.read.parquet("abfss://synapse-workspace@naveenadlsgen2de.dfs.core.windows.net/sqldb/Customer/")

Comparison

Feature	Mounting	Direct Access	Unity Catalog
Persistence	Survives cluster restart	Session only	Permanent (catalog)
Path	Short (`/mnt/datalake/`)	Full ABFSS path	Full ABFSS path
Setup	Once per mount	Every notebook/session	Once per external location
Modern?	Legacy (but widely used)	Common	Best practice
Unity Catalog compatible	No	Partial	Full

Recommendation: For new projects, use Unity Catalog external locations. For existing projects without Unity Catalog, use direct access with Key Vault secrets. Mounting still works but is considered legacy.

Secure Credential Management with Key Vault

Never Hardcode Credentials

# ❌ NEVER do this
access_key = "xYz123AbC456..."  # Visible in notebook, committed to Git

# ✅ ALWAYS do this
access_key = dbutils.secrets.get("keyvault-scope", "adls-storage-key")  # [REDACTED] in output

Setting Up Key Vault Secret Scope

Store your storage key/secret in Azure Key Vault:
Go to Key Vault → Secrets → + Generate/Import
Name: adls-storage-key
Value: paste the access key
Click Create
Create the secret scope in Databricks:
Navigate to https://<your-workspace-url>#secrets/createScope
Scope Name: keyvault-scope
DNS Name: your Key Vault URL (e.g., https://kv-dataplatform-dev.vault.azure.net/)
Resource ID: Key Vault full resource ID
Click Create
Grant Databricks access to Key Vault:
Go to Key Vault → Access Control (IAM)
Add role: Key Vault Secrets User
Member: search for AzureDatabricks (App ID: 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d)
Review + assign
Use in notebooks:

storage_key = dbutils.secrets.get("keyvault-scope", "adls-storage-key")
spark.conf.set(
    f"fs.azure.account.key.naveenadlsgen2de.dfs.core.windows.net",
    storage_key
)
print("Connected securely!")

The Config Notebook Pattern (Production)

In production, create a reusable config notebook that sets up all storage connections:

# Notebook: /Config/Storage_Config

storage_account = "naveenadlsgen2de"
scope = "keyvault-scope"

# Get credentials from Key Vault
storage_key = dbutils.secrets.get(scope, "adls-storage-key")

# Configure access
spark.conf.set(
    f"fs.azure.account.key.{storage_account}.dfs.core.windows.net",
    storage_key
)

# Define paths as variables
BRONZE_PATH = f"abfss://synapse-workspace@{storage_account}.dfs.core.windows.net/bronze/"
SILVER_PATH = f"abfss://synapse-workspace@{storage_account}.dfs.core.windows.net/silver/"
GOLD_PATH = f"abfss://synapse-workspace@{storage_account}.dfs.core.windows.net/gold/"

print(f"Storage configured for: {storage_account}")

Then in every ETL notebook:

# Cell 1: Run the config notebook
%run /Config/Storage_Config

# Cell 2: Use the pre-configured paths
df = spark.read.parquet(f"{BRONZE_PATH}customers/")
df_clean = df.filter(df.status == "Active")
df_clean.write.format("delta").mode("overwrite").save(f"{SILVER_PATH}customers/")

Why this pattern works: – Credentials are configured in ONE place – Path constants are reusable across all notebooks – Changing the storage account updates everywhere – The config notebook is tested and trusted

Real-life analogy: The config notebook is like a Wi-Fi router configuration. You set up the password once on the router, and every device in the house connects through it. You do not type the Wi-Fi password on every device separately — the router handles it.

Common Connection Errors and Fixes

Error	Cause	Fix
“No credentials found for account”	`spark.conf.set` not called or wrong account name	Verify account name matches exactly (case-sensitive)
“403 Forbidden”	Storage firewall blocking Databricks IP	Add Databricks subnet to storage firewall, or enable “Allow trusted Azure services”
“Container does not exist”	Wrong container name or using wrong protocol	Verify container name. Use `abfss://` for ADLS Gen2, `wasbs://` for Blob
“Authentication failed”	Wrong access key or expired SAS token	Regenerate key/token from Azure Portal
“Secret does not exist”	Wrong scope or secret name in `dbutils.secrets.get`	List secrets: `dbutils.secrets.list("keyvault-scope")` to verify
“KeyVault access denied”	Databricks app not granted Key Vault Secrets User role	Add `AzureDatabricks` (ID: `2ff814a6...`) as Key Vault Secrets User
“Hierarchical namespace not enabled”	Using `abfss://` on regular Blob Storage	Use `wasbs://` instead, or enable hierarchical namespace on the account
“Mount point already exists”	Trying to mount to a path that is already mounted	`dbutils.fs.unmount("/mnt/datalake")` first, then remount

Which Method Should I Use?

Learning / Quick Testing?
  → Method 1: Access Key (simplest, set up in 30 seconds)

Temporary / Shared access to specific files?
  → Method 2: SAS Token (scoped, time-limited)

Production without Unity Catalog?
  → Method 3: Service Principal + Key Vault (RBAC, auditable, rotatable)

Enterprise / New Databricks projects?
  → Method 4: Unity Catalog + Access Connector (zero secrets, governed)

The evolution path: 1. Start with Access Key to learn and test 2. Move to Service Principal when building production pipelines 3. Migrate to Unity Catalog when the organization standardizes on Databricks governance

Interview Questions

Q: How do you connect Azure Databricks to ADLS Gen2? A: Four methods: Access Key (simplest, dev only), SAS Token (scoped, temporary), Service Principal with OAuth (production — RBAC controlled), and Unity Catalog with Access Connector (enterprise — zero secrets). For production, use Service Principal with credentials stored in Azure Key Vault via a Databricks secret scope. For enterprise, use Unity Catalog.

Q: What is the difference between abfss:// and wasbs:// protocols? A: abfss:// uses the DFS (Data Lake Storage) endpoint for ADLS Gen2 accounts with hierarchical namespace enabled. wasbs:// uses the Blob endpoint for regular Blob Storage accounts. Always use abfss:// for ADLS Gen2 — it supports directory-level operations and is optimized for big data workloads.

Q: Why should you never hardcode storage keys in notebooks? A: Notebooks are committed to Git, shared with colleagues, and visible in version history. Hardcoded keys in notebooks are exposed to anyone with repository access. Use dbutils.secrets.get() to read credentials from Azure Key Vault — values are automatically redacted in notebook output.

Q: What is the config notebook pattern? A: A dedicated notebook that configures storage credentials and defines path constants. Other notebooks call %run /Config/Storage_Config to inherit the configuration. This centralizes credential management, makes paths reusable, and ensures consistent setup across all ETL notebooks.

Q: What is mounting and when should you use it? A: Mounting creates a persistent DBFS shortcut to an Azure Storage path. After mounting, you use short paths like /mnt/datalake/ instead of full ABFSS URLs. It survives cluster restarts but is considered legacy. Modern best practice is Unity Catalog external locations or direct spark.conf.set with Key Vault secrets.

Wrapping Up

Connecting Databricks to Azure Storage is the first thing you do and the most important thing to get right. The wrong method (hardcoded access keys) creates security risks. The right method (Service Principal + Key Vault or Unity Catalog) keeps your data secure and your pipelines production-ready.

Start with access keys for learning. Move to Service Principal for production. Aspire to Unity Catalog for enterprise governance. And ALWAYS use Key Vault — never hardcode credentials.

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Connecting Azure Databricks to Blob Storage and ADLS Gen2: Every Method Explained

Table of Contents

Why Databricks Needs Explicit Storage Configuration

The Four Connection Methods

Method 1: Access Key (Simplest — Dev Only)

What It Is

How to Get the Access Key

Configure in Databricks Notebook

For Blob Storage (wasbs:// protocol)

abfss:// vs wasbs:// — Which to Use?

Why Access Keys Are Dangerous in Production

Method 2: SAS Token (Scoped Access)

What It Is

Generate a SAS Token

Configure in Databricks

SAS Token Advantages

SAS Token Limitations

Method 3: Service Principal with OAuth (Production)

What It Is

Step 1: Create a Service Principal

Step 2: Assign Storage Role

Step 3: Configure in Databricks

Why Service Principal Is Better for Production

Method 4: Unity Catalog with Access Connector (Modern Best Practice)

What It Is

Step 1: Create Access Connector

Step 2: Assign Storage Role to Access Connector

Step 3: Create Storage Credential in Unity Catalog

Step 4: Create External Location

Step 5: Read Data

Why Unity Catalog Is the Modern Best Practice

Mounting vs Direct Access

Mounting (dbutils.fs.mount)

Direct Access (spark.conf.set)

Comparison

Secure Credential Management with Key Vault

Never Hardcode Credentials

Setting Up Key Vault Secret Scope

The Config Notebook Pattern (Production)

Common Connection Errors and Fixes

Which Method Should I Use?

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply