Managed vs External Tables in Azure Databricks: Unity Catalog, External Locations, Data Persistence, and Every Operation Explained

You just created a Delta table in Databricks and everything works. But then you run DROP TABLE and your data disappears. Gone. Hundreds of thousands of rows, deleted because you dropped a table.

That is a managed table. Databricks owns the data. DROP TABLE = delete metadata AND data files.

Now imagine a different setup: you drop the table, but the data files are still sitting in your ADLS Gen2 storage account. You run CREATE TABLE pointing to the same path, and the data is back — instantly, with full Delta history.

That is an external table. YOU own the data. Databricks only owns the metadata. DROP TABLE = delete the catalog entry, but your files are untouched.

Think of it like renting vs owning a house. With a managed table (renting), when you end the lease (DROP TABLE), you leave and the house is demolished. With an external table (owning), when you remove your name from the directory (DROP TABLE), the house is still there — you just re-register and move back in.

This post walks through setting up external tables from scratch — Storage Credentials, External Locations, creating tables, proving data persistence after DROP, and understanding when to use managed vs external.

Managed vs External Tables: The Core Difference
When to Use Managed vs External
The Three Layers: Credential → Location → Table
Prerequisites
Step 1: Create a Storage Credential
Step 2: Create an External Location
Step 3: Create a Schema
Step 4: Create an External Table
Step 5: Insert Data
Step 6: DROP Table — Prove Data Persists
Step 7: Recreate Table — Data Comes Back
Step 8: Delta Operations on External Tables
How to Check If a Table Is Managed or External
External Tables with Partitioning
External Tables and VACUUM
Granting Access to Other Users
Common Errors and Fixes
Interview Questions
Wrapping Up

Managed vs External Tables: The Core Difference

Feature	Managed Table	External Table
Data location	Databricks-managed storage (internal)	YOUR storage (ADLS Gen2, S3)
Created with	`saveAsTable()` or `CREATE TABLE` (no LOCATION)	`CREATE TABLE ... LOCATION 'abfss://...'`
DROP TABLE	Deletes metadata AND data files	Deletes metadata only, data files stay
Who owns the data?	Databricks	You
Data accessible outside Databricks?	No (locked in managed storage)	Yes (other tools can read ADLS directly)
Recreate after DROP?	Data lost forever	Point to same path → data is back
Best for	Curated tables, sandbox, internal analytics	Shared data, production data lakes, cross-tool access
Cost	Databricks storage charges	Your ADLS storage charges

The Visual Difference

Managed Table:
  Databricks Catalog: "employees" → data at dbfs:/user/hive/warehouse/employees/
  DROP TABLE → Catalog entry removed + data files DELETED

External Table:
  Databricks Catalog: "employees" → data at abfss://raw-data@storageaccount/employees/
  DROP TABLE → Catalog entry removed + data files STILL IN ADLS

Real-life analogy: – Managed table = renting furniture from a store. When you return it (DROP), it is gone. – External table = buying furniture and putting it in your own house. If the store closes (DROP from catalog), your furniture is still in your house.

When to Use Managed vs External

Scenario	Use Managed	Use External
Sandbox / exploration / temp data	✅
Production data lake (Bronze/Silver/Gold)		✅
Data shared with Synapse/ADF/Power BI		✅ (they read ADLS directly)
Short-lived tables (dev/testing)	✅
Compliance (data must stay in your storage)		✅
Data used by multiple Databricks workspaces		✅
Quick prototyping	✅
Long-term production tables		✅

Rule of thumb: If the data matters and other tools need to access it, use external. If it is temporary or internal to Databricks, use managed.

The Three Layers: Credential → Location → Table

Unity Catalog enforces a three-layer security model for external tables:

Layer 1: Storage Credential
  "Here is HOW to authenticate to the storage account"
  (Access Connector managed identity)
        |
        v
Layer 2: External Location
  "Here is WHERE in the storage account you can access"
  (abfss://container@account.dfs.core.windows.net/)
        |
        v
Layer 3: External Table
  "Here is the specific path for this table's data"
  (abfss://container@account.dfs.core.windows.net/external_tables/employees/)

Why three layers?

Without this structure, any user could create an external table pointing to ANY path in ANY storage account — a massive security risk. The three layers ensure: – Storage Credential: only authorized identities can access storage – External Location: only approved paths are accessible – External Table: users can only create tables under approved locations

Real-life analogy: Getting into a secure office building: – Storage Credential = your employee badge (proves WHO you are) – External Location = the floors your badge can access (WHERE you can go) – External Table = the specific office you enter (your workspace within the allowed floors)

Prerequisites

Before starting, you need:

Azure Databricks workspace (Premium tier for Unity Catalog)
ADLS Gen2 storage account with a container (e.g., raw-data)
Access Connector for Azure Databricks (created in Azure Portal)
Storage Blob Data Contributor role assigned to the Access Connector on your storage account

Creating the Access Connector (If Not Done)

Azure Portal → Create a resource → search Access Connector for Azure Databricks
Name: unity-catalog-access-connector
Resource group: same as your Databricks workspace
Click Create

Granting Storage Access to the Connector

Go to your Storage account → Access Control (IAM)
+ Add → Add role assignment
Role: Storage Blob Data Contributor
Members → Managed identity → select Access Connector for Azure Databricks → select your connector
Review + assign

Step 1: Create a Storage Credential

The Storage Credential tells Unity Catalog HOW to authenticate to your storage:

CREATE STORAGE CREDENTIAL IF NOT EXISTS adls_gen2_credential
WITH (AZURE_MANAGED_IDENTITY = (
    ACCESS_CONNECTOR_ID = '/subscriptions/<your-sub-id>/resourceGroups/<your-rg>/providers/Microsoft.Databricks/accessConnectors/unity-catalog-access-connector'
));

Where to find the ACCESS_CONNECTOR_ID: 1. Azure Portal → Access Connector resource → Properties (or Overview) 2. Copy the Resource ID — it looks like: /subscriptions/abc123.../providers/Microsoft.Databricks/accessConnectors/unity-catalog-access-connector

Verify:

SHOW STORAGE CREDENTIALS;

Step 2: Create an External Location

The External Location tells Unity Catalog WHERE in the storage account you can access:

CREATE EXTERNAL LOCATION IF NOT EXISTS adls_raw_data
URL 'abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/'
WITH (STORAGE CREDENTIAL adls_gen2_credential);

This registers the entire raw-data container. Any external table pointing to a path under this container will now work.

Verify:

SHOW EXTERNAL LOCATIONS;

Real-life analogy: The Storage Credential is like getting a master key card for a building. The External Location is like programming that key card to access specific floors. Without the External Location, the key card exists but opens nothing.

Step 3: Create a Schema

CREATE SCHEMA IF NOT EXISTS external_demo;
USE external_demo;

Step 4: Create an External Table

from pyspark.sql.functions import *
from pyspark.sql.types import *

# Create sample data
data = [
    (1, "Naveen", "Data Engineering", "Mississauga", 95000, "2023-01-15"),
    (2, "Shrey", "Data Science", "Toronto", 88000, "2023-03-20"),
    (3, "Vrushab", "DevOps", "Vancouver", 92000, "2023-06-10"),
    (4, "Vishnu", "Analytics", "Ottawa", 91000, "2023-09-01"),
    (5, "Ravi", "Engineering", "Calgary", 85000, "2024-01-05"),
]

columns = ["emp_id", "name", "department", "city", "salary", "hire_date"]

df = spark.createDataFrame(data, columns)     .withColumn("hire_date", to_date(col("hire_date")))

# Write to ADLS Gen2 as Delta
external_path = "abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/external_tables/employees"

df.write.format("delta").mode("overwrite").save(external_path)
print(f"Data written to: {external_path}")

Now register it as an external table in Unity Catalog:

CREATE TABLE IF NOT EXISTS external_demo.employees
USING DELTA
LOCATION 'abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/external_tables/employees';

-- Verify
SELECT * FROM external_demo.employees ORDER BY emp_id;

The key difference from managed tables: The LOCATION clause. Without it, Databricks stores data in its managed storage. With it, data lives in YOUR ADLS Gen2.

Step 5: Insert Data

INSERT INTO external_demo.employees VALUES
    (6, 'Anita', 'Marketing', 'Montreal', 78000, '2024-03-15'),
    (7, 'Deepak', 'Finance', 'Edmonton', 82000, '2024-05-01');

SELECT * FROM external_demo.employees ORDER BY emp_id;
-- 7 employees now

The new rows are written as new Parquet files in the SAME ADLS Gen2 path. The Delta transaction log tracks the addition.

Step 6: DROP Table — Prove Data Persists

This is the moment of truth:

-- Drop the table
DROP TABLE external_demo.employees;

-- Verify it is gone from the catalog
SHOW TABLES IN external_demo;
-- employees is NOT listed

But the data files are still in ADLS Gen2:

# Check ADLS directly — files are STILL THERE
files = dbutils.fs.ls("abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/external_tables/employees/")
for f in files:
    print(f"{f.name}  ({f.size} bytes)")

# You will see:
# _delta_log/
# part-00000-xxxx.snappy.parquet
# part-00001-xxxx.snappy.parquet
# ...

The data files and the Delta transaction log are completely untouched. DROP TABLE only removed the catalog entry — the metadata that says “there is a table called employees.” The actual Parquet files in your storage account are still there.

You can even read the data directly without a table:

# Read directly from the ADLS path — no table needed
df_direct = spark.read.format("delta").load(
    "abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/external_tables/employees/"
)
df_direct.show()
print(f"Rows: {df_direct.count()}")
# All 7 employees are here!

Real-life analogy: Dropping an external table is like removing a bookmark from your browser. The website (data) still exists — you just removed the shortcut to it. You can re-bookmark it (recreate the table) anytime.

Step 7: Recreate Table — Data Comes Back

-- Recreate the table pointing to the same ADLS path
CREATE TABLE IF NOT EXISTS external_demo.employees
USING DELTA
LOCATION 'abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/external_tables/employees';

-- All data is back instantly!
SELECT * FROM external_demo.employees ORDER BY emp_id;
-- 7 employees — exactly as before DROP

-- Even the Delta history is preserved!
DESCRIBE HISTORY external_demo.employees;
-- Shows all operations: CREATE, INSERT, etc.

The Delta transaction log (_delta_log/) was preserved in ADLS, so Databricks reads it and restores the complete table — data AND history.

Step 8: Delta Operations on External Tables

External tables support ALL the same Delta operations as managed tables:

UPDATE

UPDATE external_demo.employees
SET salary = 105000, city = 'Toronto'
WHERE emp_id = 1;

DELETE

DELETE FROM external_demo.employees
WHERE emp_id = 7;

MERGE

from delta.tables import DeltaTable

source_data = [(2, "Shrey", "ML Engineering", "Toronto", 95000, "2023-03-20"),
               (8, "Priya", "QA", "Pune", 72000, "2024-06-01")]
df_source = spark.createDataFrame(source_data, columns).withColumn("hire_date", to_date(col("hire_date")))

target = DeltaTable.forName(spark, "external_demo.employees")
target.alias("t").merge(
    df_source.alias("s"), "t.emp_id = s.emp_id"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

Time Travel

-- View all versions
DESCRIBE HISTORY external_demo.employees;

-- Query a previous version
SELECT * FROM external_demo.employees VERSION AS OF 0 ORDER BY emp_id;

OPTIMIZE and VACUUM

-- Compact small files
OPTIMIZE external_demo.employees;

-- Clean old files (respects retention period)
VACUUM external_demo.employees RETAIN 168 HOURS;

How to Check If a Table Is Managed or External

Method 1: DESCRIBE EXTENDED

DESCRIBE EXTENDED external_demo.employees;
-- Look for "Type" in the output:
-- Type = MANAGED → managed table
-- Type = EXTERNAL → external table

Method 2: Check the Location

DESCRIBE DETAIL external_demo.employees;
-- Look at the "location" column:
-- abfss://... → external (YOUR storage)
-- dbfs:/user/hive/warehouse/... → managed (Databricks storage)

Method 3: Quick PySpark

detail = spark.sql("DESCRIBE DETAIL external_demo.employees").collect()[0]
print(f"Type: {'EXTERNAL' if 'abfss://' in detail['location'] else 'MANAGED'}")
print(f"Location: {detail['location']}")

External Tables with Partitioning

# Write partitioned data to ADLS
df.write.format("delta")     .mode("overwrite")     .partitionBy("department")     .save("abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/external_tables/employees_partitioned/")

# Register as external table
spark.sql('''
    CREATE TABLE IF NOT EXISTS external_demo.employees_partitioned
    USING DELTA
    LOCATION 'abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/external_tables/employees_partitioned/'
''')

ADLS folder structure:

external_tables/employees_partitioned/
  _delta_log/
  department=Analytics/
    part-00000.parquet
  department=Data Engineering/
    part-00000.parquet
  department=DevOps/
    part-00000.parquet

Partition pruning works exactly the same as managed tables.

External Tables and VACUUM

VACUUM on external tables works identically to managed tables — it removes old Parquet files from YOUR ADLS storage beyond the retention period:

-- Default 7-day retention
VACUUM external_demo.employees RETAIN 168 HOURS;

Important difference: When you VACUUM an external table, the old files are deleted from YOUR storage account. Make sure your retention period covers your time travel needs before running VACUUM.

After VACUUM, time travel to versions whose files were deleted will fail — same behavior as managed tables.

Granting Access to Other Users

With Unity Catalog, you control who can access external tables:

-- Grant SELECT to a user
GRANT SELECT ON TABLE external_demo.employees TO `analyst@company.com`;

-- Grant SELECT on the entire schema
GRANT SELECT ON SCHEMA external_demo TO `analyst@company.com`;

-- Grant usage on the external location (needed to CREATE external tables)
GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION adls_raw_data TO `engineer@company.com`;

Common Errors and Fixes

Error	Cause	Fix
“No matching external location found”	No External Location registered for the ADLS path	Create External Location pointing to the container
“PARSE_SYNTAX_ERROR at STORAGE”	Cluster does not support Unity Catalog commands	Use a cluster with Unity Catalog enabled (shared or single-user access mode)
“ACCESS_DENIED” on External Location	Access Connector lacks Storage Blob Data Contributor role	Add the role on the storage account in Azure Portal
“Permission denied” creating External Location	Your Databricks user lacks metastore admin or CREATE EXTERNAL LOCATION	Ask workspace admin for the permission
“Table already exists”	LOCATION path already registered to another table	Drop the existing table first or use a different path
“Cannot create external table — schema does not exist”	Schema not created	Run `CREATE SCHEMA IF NOT EXISTS external_demo`
“Storage credential not found”	Typo in credential name	Run `SHOW STORAGE CREDENTIALS` to verify the exact name
Files not showing after DROP	You dropped a MANAGED table (not external)	Managed tables delete data on DROP. Data is lost.

Interview Questions

Q: What is the difference between managed and external tables in Databricks? A: Managed tables store data in Databricks-managed storage — DROP TABLE deletes both metadata and data. External tables store data in YOUR storage (ADLS Gen2) — DROP TABLE deletes only the catalog metadata, data files remain. Recreating an external table with the same LOCATION restores all data and history.

Q: What are the three layers needed for external tables in Unity Catalog? A: Storage Credential (HOW to authenticate — Access Connector managed identity), External Location (WHERE to access — registered ADLS path), and External Table (WHAT data — specific path under the location). Each layer provides security governance.

Q: What happens when you DROP an external table? A: Only the catalog entry (metadata) is deleted. The Delta data files and transaction log remain in ADLS Gen2. You can read the data directly from the path, or recreate the table with the same LOCATION to restore it with full history.

Q: When would you use an external table instead of a managed table? A: When data needs to be accessed by tools outside Databricks (Synapse, ADF, Power BI), when compliance requires data in your own storage, when multiple Databricks workspaces share the same data, or when you want data to survive table drops and workspace changes.

Q: Do external tables support all Delta operations? A: Yes. INSERT, UPDATE, DELETE, MERGE, time travel, OPTIMIZE, VACUUM, schema evolution — all work identically to managed tables. The only difference is where the data files physically reside.

Wrapping Up

External tables give you the best of both worlds: the convenience of SQL table access with the durability of data in YOUR storage. DROP a table by accident? No problem — the data is still in ADLS. Need to share data with Synapse or Power BI? They read ADLS directly. Need to migrate to a different Databricks workspace? Point the new workspace at the same ADLS path.

The setup is a one-time process: create an Access Connector, assign storage permissions, register a Storage Credential and External Location in Unity Catalog. After that, creating external tables is just adding LOCATION 'abfss://...' to your CREATE TABLE statement.

For production data lakes, external tables are the standard. Managed tables are for sandboxes and experiments. Know the difference. Choose wisely.

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Managed vs External Tables in Azure Databricks: Unity Catalog, External Locations, Data Persistence, and Every Operation Explained

Managed vs External Tables in Azure Databricks: Unity Catalog, External Locations, Data Persistence, and Every Operation Explained

Table of Contents

Managed vs External Tables: The Core Difference

The Visual Difference

When to Use Managed vs External

The Three Layers: Credential → Location → Table

Prerequisites

Creating the Access Connector (If Not Done)

Granting Storage Access to the Connector

Step 1: Create a Storage Credential

Step 2: Create an External Location

Step 3: Create a Schema

Step 4: Create an External Table

Step 5: Insert Data

Step 6: DROP Table — Prove Data Persists

Step 7: Recreate Table — Data Comes Back

Step 8: Delta Operations on External Tables

UPDATE

DELETE

MERGE

Time Travel

OPTIMIZE and VACUUM

How to Check If a Table Is Managed or External

Method 1: DESCRIBE EXTENDED

Method 2: Check the Location

Method 3: Quick PySpark

External Tables with Partitioning

External Tables and VACUUM

Granting Access to Other Users

Common Errors and Fixes

Interview Questions

Wrapping Up

Leave a Comment Cancel Reply

Managed vs External Tables in Azure Databricks: Unity Catalog, External Locations, Data Persistence, and Every Operation Explained

Table of Contents

Managed vs External Tables: The Core Difference

The Visual Difference

When to Use Managed vs External

The Three Layers: Credential → Location → Table

Prerequisites

Creating the Access Connector (If Not Done)

Granting Storage Access to the Connector

Step 1: Create a Storage Credential

Step 2: Create an External Location

Step 3: Create a Schema

Step 4: Create an External Table

Step 5: Insert Data

Step 6: DROP Table — Prove Data Persists

Step 7: Recreate Table — Data Comes Back

Step 8: Delta Operations on External Tables

UPDATE

DELETE

MERGE

Time Travel

OPTIMIZE and VACUUM

How to Check If a Table Is Managed or External

Method 1: DESCRIBE EXTENDED

Method 2: Check the Location

Method 3: Quick PySpark

External Tables with Partitioning

External Tables and VACUUM

Granting Access to Other Users

Common Errors and Fixes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply