File Storage in Azure Databricks: Volumes, DBFS, /tmp/, External Locations, and Where Your Files Actually Live

You created a file in your Databricks notebook using open("/tmp/test.csv", "w"). It worked. You restarted the cluster. The file is gone. You tried open("/Volumes/workspace/default/myvol/test.csv", "a") and got Illegal seek. You tried dbutils.fs.rm("file:/tmp/test.csv") and got LocalFilesystemAccessDeniedException.

Welcome to the confusing world of Databricks file storage — where /tmp/, /dbfs/, /Volumes/, dbfs:/, abfss://, and file:/ all look similar but behave completely differently.

This post clears up every file path, storage location, and access method in Databricks. By the end, you will know exactly where to put files, how to access them, and which path prefix to use in which context.

Think of Databricks file storage like a building with multiple rooms. /tmp/ is a temporary locker that gets emptied every night (cluster restart). DBFS is a shared storage room that persists. Volumes are labeled filing cabinets organized by catalog and schema. External Locations are doors that open to your own warehouse next door (ADLS Gen2). Each room has its own key (path prefix) — use the wrong key and the door does not open.

The Five Storage Locations in Databricks
/tmp/ — Driver-Local Temporary Storage
DBFS — Databricks File System
Unity Catalog Volumes — The Modern Way
External Locations — Your Own ADLS Gen2
The Workspace FileStore
The Path Prefix Cheat Sheet
Creating and Reading Files in Each Location
Managed Volumes vs External Volumes
Creating a Volume (Step by Step)
The Append Mode Bug (Illegal Seek)
Python open() vs dbutils.fs vs spark.read
Uploading Files via UI
Which Storage for Which Use Case
Common Errors and Fixes
Interview Questions
Wrapping Up

The Five Storage Locations in Databricks

Location	Path	Persists?	Visible in UI?	Best For
/tmp/	`/tmp/file.csv`	No (lost on restart)	No	Quick scratch files
DBFS	`dbfs:/FileStore/file.csv`	Yes	Data tab (DBFS)	Legacy file storage
Volumes	`/Volumes/catalog/schema/vol/file.csv`	Yes	Catalog Explorer	Modern file storage
External Location	`abfss://container@account.dfs.core.windows.net/`	Yes (your storage)	No (browse in Azure)	Production data lake
Workspace FileStore	`/FileStore/file.csv`	Yes	No (access via URL)	Small files, images for notebooks

/tmp/ — Driver-Local Temporary Storage

What It Is

/tmp/ is the local filesystem of the driver VM — the actual machine running your notebook. It is NOT distributed, NOT shared between nodes, and NOT persistent.

# Write a file to /tmp/
with open("/tmp/quick_test.csv", "w") as f:
    f.write("id,name
1,Naveen
2,Shrey
")

# Read it back
with open("/tmp/quick_test.csv", "r") as f:
    print(f.read())

The Catch

Cluster starts → /tmp/ is empty
You create /tmp/test.csv → file exists
Cluster restarts → /tmp/test.csv is GONE

When to Use

Quick one-off tests
Temporary intermediate files during a notebook run
Never for anything you need to keep

When NOT to Use

Storing data between runs
Sharing files with other notebooks
Anything production

Real-life analogy: /tmp/ is a whiteboard in a meeting room. Write whatever you need during the meeting. The janitor erases it overnight. It is never meant for permanent notes.

DBFS — Databricks File System

What It Is

DBFS is Databricks’ built-in distributed file system. Files persist across cluster restarts. It is backed by cloud storage (Azure Blob behind the scenes).

The Confusing Part: Two Path Styles

DBFS files have TWO valid paths depending on which tool you use:

# Using dbutils (Databricks utility) — use dbfs:/ prefix
dbutils.fs.put("/FileStore/my_files/test.csv", "id,name
1,Naveen
", overwrite=True)
dbutils.fs.ls("/FileStore/my_files/")
dbutils.fs.head("/FileStore/my_files/test.csv")

# Using Python open() — use /dbfs/ prefix
with open("/dbfs/FileStore/my_files/test.csv", "r") as f:
    print(f.read())

# Using Spark — no prefix needed
df = spark.read.csv("/FileStore/my_files/test.csv", header=True)
df.show()

The same file, three different paths:

Tool	Path
`dbutils.fs`	`/FileStore/my_files/test.csv`
Python `open()`	`/dbfs/FileStore/my_files/test.csv`
`spark.read`	`/FileStore/my_files/test.csv` or `dbfs:/FileStore/my_files/test.csv`

When to Use

Legacy workspaces without Unity Catalog
Databricks Community Edition (free tier — no Volumes available)
Storing small reference files

When NOT to Use

Unity Catalog workspaces (use Volumes instead)
Production data (use External Locations pointing to ADLS)
Large datasets

Real-life analogy: DBFS is like a shared network drive in an office. Everyone can access it, files persist, but it belongs to Databricks — you do not control the underlying storage.

Unity Catalog Volumes — The Modern Way

What It Is

Volumes are Unity Catalog’s managed file storage. They sit inside the catalog hierarchy: Catalog → Schema → Volume → Files. Think of them as organized folders governed by Unity Catalog permissions.

Catalog: workspace
  Schema: default
    Volume: naveenvol
      File: employees.csv
      File: pipeline_log.txt
      File: config.json

The Path

/Volumes/catalog_name/schema_name/volume_name/filename
/Volumes/workspace/default/naveenvol/employees.csv

How to Use

# Write
with open("/Volumes/workspace/default/naveenvol/employees.csv", "w") as f:
    f.write("id,name,dept,salary
")
    f.write("1001,Naveen,Data Engineering,95000
")
    f.write("1002,Shrey,Data Science,88000
")

# Read with Python
with open("/Volumes/workspace/default/naveenvol/employees.csv", "r") as f:
    print(f.read())

# Read with Spark
df = spark.read.csv("/Volumes/workspace/default/naveenvol/employees.csv", header=True)
df.show()

# List files
import os
files = os.listdir("/Volumes/workspace/default/naveenvol/")
print(f"Files: {files}")

# Delete
os.remove("/Volumes/workspace/default/naveenvol/employees.csv")

Visible in UI

Click Catalog in the sidebar
Navigate: workspace → default → naveenvol
Click Files tab — your files show up here

When to Use

All file operations on workspaces with Unity Catalog
Storing CSV/JSON files for notebook exercises
Landing files for Spark to read
Config files, lookup files, reference data

Real-life analogy: Volumes are like labeled filing cabinets in an office. Each cabinet (volume) belongs to a department (schema) in a building (catalog). Files are organized, labeled, findable, and access-controlled. Anyone with permission can browse the cabinet and see what is inside.

External Locations — Your Own ADLS Gen2

What It Is

External Locations let Databricks access YOUR storage account (ADLS Gen2). The files live in YOUR Azure subscription, not in Databricks-managed storage.

# Read from your ADLS Gen2 directly
df = spark.read.parquet("abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/bronze/customers/")

# Write to your ADLS Gen2
df.write.format("delta").save("abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/silver/customers/")

Setup Required

Storage Credential — how to authenticate (Access Connector managed identity)
External Location — which ADLS path to allow access to
See our External Tables post for full setup

When to Use

Production data lake (Bronze/Silver/Gold layers)
Data shared across Databricks workspaces, Synapse, ADF, Power BI
Compliance requirements (data must stay in your storage)

Real-life analogy: External Locations are like having a key to the warehouse next door. The warehouse (ADLS) is yours — you own it, you control it. The key (External Location) lets Databricks open the door and access your inventory.

The Path Prefix Cheat Sheet

What You Are Doing	Path to Use	Example
Python `open()` to /tmp/	`/tmp/`	`/tmp/test.csv`
Python `open()` to DBFS	`/dbfs/`	`/dbfs/FileStore/test.csv`
Python `open()` to Volume	`/Volumes/`	`/Volumes/workspace/default/naveenvol/test.csv`
`dbutils.fs` to DBFS	No prefix or `dbfs:/`	`/FileStore/test.csv`
`dbutils.fs` to Volume	`/Volumes/`	`/Volumes/workspace/default/naveenvol/`
`dbutils.fs` to ADLS	`abfss://`	`abfss://container@account.dfs.core.windows.net/`
`spark.read` from DBFS	No prefix	`/FileStore/test.csv`
`spark.read` from Volume	`/Volumes/`	`/Volumes/workspace/default/naveenvol/test.csv`
`spark.read` from ADLS	`abfss://`	`abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/`

The rule: Python open() needs /dbfs/ for DBFS. Everything else uses the path directly. Volumes always use /Volumes/. ADLS always uses abfss://.

Managed Volumes vs External Volumes

Feature	Managed Volume	External Volume
Data stored in	Databricks-managed storage	YOUR ADLS Gen2
DROP VOLUME	Deletes data + metadata	Deletes metadata, data stays in ADLS
Create SQL	`CREATE VOLUME myvol`	`CREATE EXTERNAL VOLUME myvol LOCATION 'abfss://...'`
Best for	Notebooks, exercises, small files	Production, shared files
Visible in	Catalog Explorer	Catalog Explorer + Azure Portal

Creating an External Volume

-- Requires an External Location already set up
CREATE EXTERNAL VOLUME workspace.default.adls_volume
LOCATION 'abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/volumes/';

Files uploaded to this volume physically land in your ADLS Gen2 container.

Creating a Volume (Step by Step)

Via UI

Click Catalog in the sidebar
Select your catalog (e.g., workspace)
Select a schema (e.g., default)
Click + Create → Volume
Name: naveenvol
Type: Managed (or External with ADLS path)
Click Create

Via SQL

-- Managed volume
CREATE VOLUME IF NOT EXISTS workspace.default.naveenvol;

-- External volume (requires External Location)
CREATE EXTERNAL VOLUME IF NOT EXISTS workspace.default.adls_volume
LOCATION 'abfss://raw-data@naveenadlsgen2de.dfs.core.windows.net/volumes/';

Via Python

spark.sql("CREATE VOLUME IF NOT EXISTS workspace.default.naveenvol")

The Append Mode Bug (Illegal Seek)

This error caught us during our practice session:

# This FAILS on Volumes
with open("/Volumes/workspace/default/naveenvol/log.txt", "a") as f:
    f.write("New line
")
# OSError: [Errno 29] Illegal seek

Why It Fails

Databricks Volumes use cloud object storage underneath, which does NOT support file append natively. The "a" (append) mode tries to seek to the end of the file — cloud storage does not support seeking.

The Workaround

Read existing content, append new content, rewrite the entire file:

import os

vol_path = "/Volumes/workspace/default/naveenvol/log.txt"

# Read existing content (if file exists)
existing = ""
if os.path.exists(vol_path):
    with open(vol_path, "r") as f:
        existing = f.read()

# Append new content and write everything
with open(vol_path, "w") as f:
    f.write(existing)
    f.write("New line added!
")

print("Appended successfully!")

File Modes That Work on Volumes

Mode	Works?	What It Does
`"r"`	✅ Yes	Read
`"w"`	✅ Yes	Write (overwrite)
`"a"`	❌ No	Append — `Illegal seek` error
`"rb"`	✅ Yes	Read binary
`"wb"`	✅ Yes	Write binary

Python open() vs dbutils.fs vs spark.read

Feature	Python `open()`	`dbutils.fs`	`spark.read`
Read small files	✅ Best	✅ `head()`	Overkill
Read large data	❌ Slow (single thread)	❌ Not for processing	✅ Best (distributed)
Write small files	✅ Best	✅ `put()`	Overkill
Write large data	❌ Slow	❌ Not for this	✅ Best (distributed)
Delete files	`os.remove()`	`dbutils.fs.rm()`	N/A
List files	`os.listdir()`	`dbutils.fs.ls()`	N/A
Works with /tmp/	✅	❌	❌
Works with DBFS	✅ (with `/dbfs/` prefix)	✅	✅
Works with Volumes	✅	✅	✅
Works with ADLS	❌	✅	✅

Rule of thumb: – Small files (config, logs, CSVs under 1 MB) → Python open() – File management (list, delete, move, copy) → dbutils.fs – Data processing (Parquet, Delta, large CSV) → spark.read / spark.write

Uploading Files via UI

Upload to Volume

Catalog → navigate to your volume
Click Upload to volume button
Drag and drop files
Files appear under the Files tab

Upload to DBFS

Click Data in the sidebar
Click DBFS tab
Navigate to /FileStore/
Click Upload → drag and drop

Upload to Table (Create Table UI)

Click Data in the sidebar
Click Create Table
Upload CSV → Databricks creates a managed table directly

Which Storage for Which Use Case

Use Case	Storage	Path
Quick scratch file during development	`/tmp/`	`/tmp/test.csv`
Notebook exercise files	Volume	`/Volumes/workspace/default/naveenvol/`
Config files, lookup data	Volume	`/Volumes/catalog/schema/config_vol/`
Production data lake (Bronze/Silver/Gold)	External Location (ADLS)	`abfss://container@account.dfs.core.windows.net/`
Legacy workspace without Unity Catalog	DBFS	`/FileStore/my_files/`
Files shared across workspaces	External Location (ADLS)	`abfss://`
Small images for notebooks	Workspace FileStore	`/FileStore/images/`

Common Errors and Fixes

Error	Cause	Fix
`FileNotFoundError: /tmp/test.csv`	Cluster restarted, `/tmp/` was cleared	Use Volumes instead for persistent files
`OSError: [Errno 29] Illegal seek`	Append mode `"a"` on Volumes	Read existing → append in memory → write all with `"w"`
`LocalFilesystemAccessDeniedException`	`dbutils.fs.rm("file:/tmp/...")` blocked	Use `os.remove("/tmp/...")` or ignore (cleared on restart)
`No such file or directory: /dbfs/...`	Missing `/dbfs/` prefix for Python `open()` on DBFS	Add `/dbfs/` prefix: `/dbfs/FileStore/test.csv`
`Path does not exist: /Volumes/...`	Volume not created or wrong catalog/schema	Verify in Catalog Explorer, check catalog and schema names
`PERMISSION_DENIED creating volume`	Lack CREATE VOLUME privilege	Ask workspace admin for permission on the schema
`Cannot access non /Workspace local filesystem`	Databricks blocking local filesystem access entirely	Use Volumes instead of `/tmp/`

Interview Questions

Q: What are the storage options in Databricks? A: Five options: /tmp/ (driver-local, non-persistent), DBFS (Databricks-managed, persistent), Unity Catalog Volumes (modern governed storage), External Locations (your ADLS Gen2), and Workspace FileStore (legacy small file storage). For production, use External Locations for the data lake and Volumes for config/reference files.

Q: What is the difference between a Managed Volume and an External Volume? A: A Managed Volume stores data in Databricks-managed storage — DROP VOLUME deletes the data. An External Volume points to your ADLS Gen2 — DROP VOLUME removes only the metadata, data files remain in your storage. External Volumes require an External Location to be set up first.

Q: Why does append mode fail on Volumes? A: Volumes use cloud object storage (Azure Blob) underneath, which does not support native file append or seek operations. The workaround is to read the existing file content, append the new content in memory, and write the entire file using write mode ("w").

Q: When should you use Python open() vs spark.read? A: Use Python open() for small files (configs, logs, CSVs under 1 MB). Use spark.read for data processing (Parquet, Delta, large CSVs). Python open() runs on the driver only (single thread). Spark distributes the read across the cluster (parallel).

Wrapping Up

File storage in Databricks is confusing because there are five different locations with different path prefixes, persistence rules, and access methods. But once you understand the map, it is simple:

Development scratch files → /tmp/ (temporary) or Volumes (persistent)
Notebook data files → Volumes (/Volumes/catalog/schema/vol/)
Production data lake → External Locations (abfss://)
Legacy workspaces → DBFS (/dbfs/FileStore/)

Use Volumes for everyday file operations. Use External Locations for production data. Forget /tmp/ exists unless you need a 5-second scratch file. And always remember: append mode does not work on Volumes — read, modify, rewrite.

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

File Storage in Azure Databricks: Volumes, DBFS, /tmp/, External Locations, and Where Your Files Actually Live

Table of Contents

The Five Storage Locations in Databricks

/tmp/ — Driver-Local Temporary Storage

What It Is

The Catch

When to Use

When NOT to Use

DBFS — Databricks File System

What It Is

The Confusing Part: Two Path Styles

When to Use

When NOT to Use

Unity Catalog Volumes — The Modern Way

What It Is

The Path

How to Use

Visible in UI

When to Use

External Locations — Your Own ADLS Gen2

What It Is

Setup Required

When to Use

The Path Prefix Cheat Sheet

Managed Volumes vs External Volumes

Creating an External Volume

Creating a Volume (Step by Step)

Via UI

Via SQL

Via Python

The Append Mode Bug (Illegal Seek)

Why It Fails

The Workaround

File Modes That Work on Volumes

Python open() vs dbutils.fs vs spark.read

Uploading Files via UI

Upload to Volume

Upload to DBFS

Upload to Table (Create Table UI)

Which Storage for Which Use Case

Common Errors and Fixes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply