OneLake Deep Dive: Architecture, ADLS Gen2 Compatibility, OneLake File Explorer, Multi-Cloud Shortcuts, Storage Billing, and the Foundation of Microsoft Fabric
Every Fabric item — Lakehouse, Warehouse, KQL Database, Semantic Model — stores its data in ONE place: OneLake. It is the unified storage layer underneath all of Fabric. Understanding OneLake is like understanding the foundation of a building — everything above it depends on it being solid, organized, and accessible.
This post goes beyond the basics. We cover OneLake’s architecture, its ADLS Gen2 API compatibility (meaning you can access OneLake with existing ADLS tools), the OneLake File Explorer (browse from Windows), multi-cloud shortcuts (access S3 and GCS without copying), storage billing, and the patterns that make OneLake the most important Fabric component.
Think of OneLake as a single massive filing cabinet shared across the entire company. Every department (workspace) gets drawers (lakehouses, warehouses). Every drawer has folders (tables, files). The filing cabinet has ONE address — not 50 different storage accounts scattered across Azure. Anyone with the right key (permissions) can access any drawer from anywhere (including non-Microsoft tools via ADLS Gen2 APIs).
Table of Contents
- What Is OneLake?
- OneLake vs ADLS Gen2 vs S3
- The OneLake Hierarchy
- OneLake Architecture
- One Tenant = One OneLake
- Namespaces and Paths
- ADLS Gen2 API Compatibility
- Accessing OneLake from External Tools
- Azure Storage Explorer
- AzCopy
- ADLS Gen2 SDK (Python)
- Databricks (External)
- OneLake File Explorer (Windows Desktop)
- Installing and Using
- Sync Local Files with OneLake
- Multi-Cloud Shortcuts
- How Shortcuts Access S3 and GCS
- Shortcut Caching (Reduce Egress)
- OneLake Data Hub
- Discovering Data Across Workspaces
- Storage Billing and Optimization
- What Counts as Storage
- BCDR (Disaster Recovery) Replication
- Soft Delete and Recovery
- Storage Optimization Tips
- OneLake Security
- Workspace Roles + Data Access Roles
- Firewall and Private Endpoints
- Real-World OneLake Patterns
- Pattern 1: Centralized Data Lake
- Pattern 2: Hub-and-Spoke
- Pattern 3: Multi-Cloud Unified Lake
- Common Mistakes
- Interview Questions
- Wrapping Up
What Is OneLake?
OneLake is Fabric’s built-in, unified storage layer — a single data lake for your entire organization. Every Fabric workspace, every lakehouse, every warehouse writes its data to OneLake. There is nothing to provision, no storage accounts to create, no access keys to manage.
Traditional approach:
Team A: ADLS Gen2 account → storageA.dfs.core.windows.net
Team B: ADLS Gen2 account → storageB.dfs.core.windows.net
Team C: ADLS Gen2 account → storageC.dfs.core.windows.net
→ 3 storage accounts, 3 sets of credentials, 3 different access controls
OneLake approach:
Team A: OneLake → workspace_A/lakehouse_A/Tables/...
Team B: OneLake → workspace_B/lakehouse_B/Tables/...
Team C: OneLake → workspace_C/lakehouse_C/Tables/...
→ 1 storage layer, 1 set of credentials (Azure AD), 1 governance model
OneLake vs ADLS Gen2 vs S3
| Feature | OneLake | ADLS Gen2 | Amazon S3 |
|---|---|---|---|
| Provisioning | Automatic (built into Fabric) | Manual (create storage account) | Manual (create bucket) |
| Authentication | Azure AD (automatic in Fabric) | Access key, SAS, MI, SP | IAM, access key |
| Organization | Tenant → Workspace → Item → Tables/Files | Account → Container → Folders | Bucket → Prefix |
| Format | Delta Lake (default for tables) | Any | Any |
| Multi-workload | All Fabric workloads read/write natively | Needs connectors | Needs connectors |
| Governance | Purview integrated, sensitivity labels | Purview integration available | AWS Macie |
| Shortcuts | Internal + external (ADLS, S3, GCS) | N/A | N/A |
| Billing | ~$0.023/GB/month | ~$0.020-0.046/GB/month | ~$0.023/GB/month |
The OneLake Hierarchy
OneLake (one per tenant)
└── Workspace: DataEng_Prod
├── Lakehouse: bronze_lakehouse
│ ├── Tables/
│ │ ├── raw_customers/ (Delta files)
│ │ └── raw_orders/ (Delta files)
│ └── Files/
│ └── uploads/ (raw CSV, JSON)
│
├── Lakehouse: silver_lakehouse
│ └── Tables/
│ ├── customers_clean/
│ └── orders_validated/
│
├── Warehouse: gold_warehouse
│ └── gold/
│ ├── dim_customer/ (Parquet, managed)
│ └── fact_sales/ (Parquet, managed)
│
└── KQL Database: iot_analytics
└── sensor_readings/ (columnar store)
Every item writes to OneLake. The physical path:
onelake.dfs.fabric.microsoft.com
/{workspace_id}/{item_id}/Tables/{table_name}/
/{workspace_id}/{item_id}/Files/{folder_name}/
ADLS Gen2 API Compatibility
OneLake implements the ADLS Gen2 REST API. This means ANY tool that works with ADLS Gen2 also works with OneLake — zero code changes:
ADLS Gen2 endpoint: https://storageaccount.dfs.core.windows.net/container/path
OneLake endpoint: https://onelake.dfs.fabric.microsoft.com/workspace/item/path
Same API, different endpoint. Switch the URL and everything works.
Accessing OneLake from External Tools
Azure Storage Explorer
- Open Azure Storage Explorer
- Click Connect → ADLS Gen2 or OneLake
- URL:
https://onelake.dfs.fabric.microsoft.com/ - Sign in with Azure AD
- Browse workspaces → items → tables/files
AzCopy
# Copy a file TO OneLake
azcopy copy "local_file.csv" "https://onelake.dfs.fabric.microsoft.com/workspace_name/lakehouse_name.Lakehouse/Files/uploads/local_file.csv"
# Copy FROM OneLake
azcopy copy "https://onelake.dfs.fabric.microsoft.com/workspace_name/lakehouse_name.Lakehouse/Files/data.csv" "local_copy.csv"
ADLS Gen2 SDK (Python)
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
# Connect to OneLake using the SAME ADLS Gen2 SDK
credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(
account_url="https://onelake.dfs.fabric.microsoft.com",
credential=credential
)
# List files in a lakehouse
file_system_client = service_client.get_file_system_client("workspace_name")
paths = file_system_client.get_paths(path="lakehouse_name.Lakehouse/Files/")
for path in paths:
print(path.name)
Databricks (External)
# Access OneLake from a Databricks notebook (outside Fabric)
df = spark.read.format("delta").load(
"abfss://workspace_name@onelake.dfs.fabric.microsoft.com/lakehouse_name.Lakehouse/Tables/customers"
)
# Uses ADLS Gen2 protocol (abfss://) — Databricks treats OneLake like any ADLS account
OneLake File Explorer (Windows Desktop)
A Windows app that syncs OneLake data to your local file system:
Installing and Using
- Download from Microsoft Store → search “OneLake File Explorer”
- Install → sign in with Azure AD
- OneLake appears as a drive in File Explorer (like OneDrive)
- Browse:
OneLake → workspace → lakehouse → Files/Tables - Copy files to/from OneLake by dragging and dropping
File Explorer:
OneLake - Contoso
└── DataEng_Prod
├── bronze_lakehouse
│ ├── Files
│ │ └── uploads (drag CSV here to upload!)
│ └── Tables
│ ├── raw_customers
│ └── raw_orders
└── silver_lakehouse
└── Tables
└── customers_clean
Use for: Quick file uploads, browsing table structures, downloading small files for local analysis.
Multi-Cloud Shortcuts
OneLake shortcuts make external data (S3, GCS) appear as if it is local:
OneLake Lakehouse:
Tables/
local_customers/ (actual Delta files in OneLake)
aws_events/ ← SHORTCUT to s3://company-events/processed/
gcp_analytics/ ← SHORTCUT to gs://analytics-bucket/reports/
One notebook query:
SELECT * FROM local_customers c
JOIN aws_events e ON c.id = e.customer_id
JOIN gcp_analytics g ON c.id = g.customer_id
Three clouds. One query. Zero data movement.
Shortcut Caching
For cross-cloud shortcuts, enable caching to avoid repeated egress fees:
First read: OneLake → S3 (egress fee) → data + cached locally
Second read: OneLake → local cache (no egress fee) → instant
Enable: Workspace settings → OneLake → Cache setting → On
OneLake Data Hub
The Data Hub is a searchable catalog of all data items across all workspaces:
- Click OneLake data hub in the left sidebar
- Browse or search for items: “customers,” “sales,” “dim_product”
- See: item name, workspace, type, owner, endorsement status
- Click to explore or create a shortcut to the item
Use for: Discovering data created by other teams without asking “where is the customer table?”
Storage Billing and Optimization
What Counts as Storage
| Stored Item | Billed? | Location |
|---|---|---|
| Lakehouse Tables (Delta) | ✅ Yes | OneLake |
| Lakehouse Files (CSV, JSON) | ✅ Yes | OneLake |
| Warehouse tables | ✅ Yes | OneLake |
| KQL Database data | ✅ Yes | OneLake |
| Shortcut target data | ❌ No (billed at source) | S3, GCS, ADLS |
| Shortcut cache | ✅ Yes | OneLake |
| Delta log files | ✅ Yes | OneLake |
| Old Delta versions (before VACUUM) | ✅ Yes | OneLake |
| Mirrored database replicas | ✅ Yes | OneLake |
Soft Delete and Recovery
OneLake retains deleted data for a recovery period:
Delete a table → data soft-deleted → recoverable for retention period
After retention → permanently deleted → storage freed
Configure retention: Workspace settings → OneLake → Soft delete retention
Storage Optimization Tips
- Run VACUUM on Delta tables — old versions consume storage.
VACUUM table RETAIN 168 HOURSremoves versions older than 7 days. - Use shortcuts instead of copies — if data exists in ADLS, create a shortcut instead of copying to OneLake.
- Delete staging data after processing — staging tables (bronze) that have been transformed to silver do not need to persist forever.
- Compress before uploading — Parquet/Delta are already compressed. CSVs are not — convert to Delta after landing.
- Monitor storage — Fabric Admin Portal shows storage usage per workspace.
Real-World OneLake Patterns
Pattern 1: Centralized Data Lake
OneLake
└── Workspace: Central_DataLake
├── bronze_lakehouse (all raw data lands here)
├── silver_lakehouse (all cleaned data)
├── gold_warehouse (star schema for all teams)
└── All teams access via Viewer role or shortcuts
Pattern 2: Hub-and-Spoke
OneLake
├── Workspace: DataEng_Hub (shared data)
│ ├── silver_lakehouse (master clean data)
│ └── gold_warehouse (shared star schema)
│
├── Workspace: Sales_Spoke (sales team)
│ └── sales_lakehouse
│ Tables/customers ← SHORTCUT to Hub silver
│
├── Workspace: Marketing_Spoke (marketing team)
│ └── marketing_lakehouse
│ Tables/customers ← SHORTCUT to Hub silver
│
└── Each spoke reads from Hub via shortcuts — no data duplication
Pattern 3: Multi-Cloud Unified Lake
OneLake
└── Workspace: Unified_Analytics
├── lakehouse
│ Tables/
│ azure_crm ← SHORTCUT to ADLS Gen2
│ aws_events ← SHORTCUT to Amazon S3
│ gcp_logs ← SHORTCUT to Google Cloud Storage
│ local_dims ← Local Delta tables
│
└── All queryable together in one notebook or SQL endpoint
Common Mistakes
-
Creating separate storage accounts alongside OneLake — OneLake IS your storage. Do not create ADLS Gen2 accounts for Fabric data. Use OneLake directly.
-
Not using shortcuts for shared data — copying data between workspaces wastes storage and creates staleness. Use internal shortcuts.
-
Not running VACUUM — old Delta versions accumulate. A table with 1 GB of current data can have 10 GB of old versions. VACUUM regularly.
-
Not exploring the Data Hub — teams duplicate work because they do not know other teams’ data exists. The Data Hub makes all data discoverable.
-
Ignoring ADLS Gen2 compatibility — existing ADLS tools (Storage Explorer, AzCopy, Python SDK) work with OneLake. Do not build custom connectors when standard tools work.
Interview Questions
Q: What is OneLake and how does it differ from ADLS Gen2? A: OneLake is Fabric’s built-in, unified storage layer — one per tenant, automatic provisioning, Azure AD authentication, integrated with all Fabric workloads. ADLS Gen2 is a standalone Azure service requiring manual provisioning and configuration. OneLake is built on ADLS Gen2 technology and implements the same REST API, so existing ADLS tools work with OneLake.
Q: How can you access OneLake from outside Fabric?
A: OneLake implements the ADLS Gen2 API at onelake.dfs.fabric.microsoft.com. Access via Azure Storage Explorer, AzCopy, ADLS Gen2 Python SDK, Databricks (abfss:// protocol), or OneLake File Explorer (Windows desktop app). Any tool that supports ADLS Gen2 works with OneLake by changing the endpoint URL.
Q: What are OneLake shortcuts and why are they important? A: Shortcuts are pointers to data in other locations (other workspaces, ADLS Gen2, S3, GCS) that appear as local tables in your Lakehouse. They eliminate data duplication — multiple teams access the same data through shortcuts without copying. Cross-cloud shortcuts enable multi-cloud analytics from a single query.
Wrapping Up
OneLake is the invisible foundation that makes Fabric work. Every Lakehouse, Warehouse, KQL Database, and Semantic Model writes to OneLake. ADLS Gen2 API compatibility means existing tools work seamlessly. Shortcuts unify data across workspaces and clouds. And the Data Hub makes everything discoverable.
Related posts: – OneLake Shortcuts – Fabric Foundations – Fabric Lakehouse Guide – ADLS Gen2 Guide
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.