OneLake Deep Dive: Architecture, ADLS Gen2 Compatibility, OneLake File Explorer, Multi-Cloud Shortcuts, Storage Billing, and the Foundation of Microsoft Fabric

OneLake Deep Dive: Architecture, ADLS Gen2 Compatibility, OneLake File Explorer, Multi-Cloud Shortcuts, Storage Billing, and the Foundation of Microsoft Fabric

Every Fabric item — Lakehouse, Warehouse, KQL Database, Semantic Model — stores its data in ONE place: OneLake. It is the unified storage layer underneath all of Fabric. Understanding OneLake is like understanding the foundation of a building — everything above it depends on it being solid, organized, and accessible.

This post goes beyond the basics. We cover OneLake’s architecture, its ADLS Gen2 API compatibility (meaning you can access OneLake with existing ADLS tools), the OneLake File Explorer (browse from Windows), multi-cloud shortcuts (access S3 and GCS without copying), storage billing, and the patterns that make OneLake the most important Fabric component.

Think of OneLake as a single massive filing cabinet shared across the entire company. Every department (workspace) gets drawers (lakehouses, warehouses). Every drawer has folders (tables, files). The filing cabinet has ONE address — not 50 different storage accounts scattered across Azure. Anyone with the right key (permissions) can access any drawer from anywhere (including non-Microsoft tools via ADLS Gen2 APIs).

Table of Contents

  • What Is OneLake?
  • OneLake vs ADLS Gen2 vs S3
  • The OneLake Hierarchy
  • OneLake Architecture
  • One Tenant = One OneLake
  • Namespaces and Paths
  • ADLS Gen2 API Compatibility
  • Accessing OneLake from External Tools
  • Azure Storage Explorer
  • AzCopy
  • ADLS Gen2 SDK (Python)
  • Databricks (External)
  • OneLake File Explorer (Windows Desktop)
  • Installing and Using
  • Sync Local Files with OneLake
  • Multi-Cloud Shortcuts
  • How Shortcuts Access S3 and GCS
  • Shortcut Caching (Reduce Egress)
  • OneLake Data Hub
  • Discovering Data Across Workspaces
  • Storage Billing and Optimization
  • What Counts as Storage
  • BCDR (Disaster Recovery) Replication
  • Soft Delete and Recovery
  • Storage Optimization Tips
  • OneLake Security
  • Workspace Roles + Data Access Roles
  • Firewall and Private Endpoints
  • Real-World OneLake Patterns
  • Pattern 1: Centralized Data Lake
  • Pattern 2: Hub-and-Spoke
  • Pattern 3: Multi-Cloud Unified Lake
  • Common Mistakes
  • Interview Questions
  • Wrapping Up

What Is OneLake?

OneLake is Fabric’s built-in, unified storage layer — a single data lake for your entire organization. Every Fabric workspace, every lakehouse, every warehouse writes its data to OneLake. There is nothing to provision, no storage accounts to create, no access keys to manage.

Traditional approach:
  Team A: ADLS Gen2 account → storageA.dfs.core.windows.net
  Team B: ADLS Gen2 account → storageB.dfs.core.windows.net
  Team C: ADLS Gen2 account → storageC.dfs.core.windows.net
  → 3 storage accounts, 3 sets of credentials, 3 different access controls

OneLake approach:
  Team A: OneLake → workspace_A/lakehouse_A/Tables/...
  Team B: OneLake → workspace_B/lakehouse_B/Tables/...
  Team C: OneLake → workspace_C/lakehouse_C/Tables/...
  → 1 storage layer, 1 set of credentials (Azure AD), 1 governance model

OneLake vs ADLS Gen2 vs S3

Feature OneLake ADLS Gen2 Amazon S3
Provisioning Automatic (built into Fabric) Manual (create storage account) Manual (create bucket)
Authentication Azure AD (automatic in Fabric) Access key, SAS, MI, SP IAM, access key
Organization Tenant → Workspace → Item → Tables/Files Account → Container → Folders Bucket → Prefix
Format Delta Lake (default for tables) Any Any
Multi-workload All Fabric workloads read/write natively Needs connectors Needs connectors
Governance Purview integrated, sensitivity labels Purview integration available AWS Macie
Shortcuts Internal + external (ADLS, S3, GCS) N/A N/A
Billing ~$0.023/GB/month ~$0.020-0.046/GB/month ~$0.023/GB/month

The OneLake Hierarchy

OneLake (one per tenant)
  └── Workspace: DataEng_Prod
        ├── Lakehouse: bronze_lakehouse
        │     ├── Tables/
        │     │     ├── raw_customers/ (Delta files)
        │     │     └── raw_orders/ (Delta files)
        │     └── Files/
        │           └── uploads/ (raw CSV, JSON)
        │
        ├── Lakehouse: silver_lakehouse
        │     └── Tables/
        │           ├── customers_clean/
        │           └── orders_validated/
        │
        ├── Warehouse: gold_warehouse
        │     └── gold/
        │           ├── dim_customer/ (Parquet, managed)
        │           └── fact_sales/ (Parquet, managed)
        │
        └── KQL Database: iot_analytics
              └── sensor_readings/ (columnar store)

Every item writes to OneLake. The physical path:

onelake.dfs.fabric.microsoft.com
  /{workspace_id}/{item_id}/Tables/{table_name}/
  /{workspace_id}/{item_id}/Files/{folder_name}/

ADLS Gen2 API Compatibility

OneLake implements the ADLS Gen2 REST API. This means ANY tool that works with ADLS Gen2 also works with OneLake — zero code changes:

ADLS Gen2 endpoint: https://storageaccount.dfs.core.windows.net/container/path
OneLake endpoint:   https://onelake.dfs.fabric.microsoft.com/workspace/item/path

Same API, different endpoint. Switch the URL and everything works.

Accessing OneLake from External Tools

Azure Storage Explorer

  1. Open Azure Storage Explorer
  2. Click ConnectADLS Gen2 or OneLake
  3. URL: https://onelake.dfs.fabric.microsoft.com/
  4. Sign in with Azure AD
  5. Browse workspaces → items → tables/files

AzCopy

# Copy a file TO OneLake
azcopy copy "local_file.csv"   "https://onelake.dfs.fabric.microsoft.com/workspace_name/lakehouse_name.Lakehouse/Files/uploads/local_file.csv"

# Copy FROM OneLake
azcopy copy   "https://onelake.dfs.fabric.microsoft.com/workspace_name/lakehouse_name.Lakehouse/Files/data.csv"   "local_copy.csv"

ADLS Gen2 SDK (Python)

from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient

# Connect to OneLake using the SAME ADLS Gen2 SDK
credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(
    account_url="https://onelake.dfs.fabric.microsoft.com",
    credential=credential
)

# List files in a lakehouse
file_system_client = service_client.get_file_system_client("workspace_name")
paths = file_system_client.get_paths(path="lakehouse_name.Lakehouse/Files/")
for path in paths:
    print(path.name)

Databricks (External)

# Access OneLake from a Databricks notebook (outside Fabric)
df = spark.read.format("delta").load(
    "abfss://workspace_name@onelake.dfs.fabric.microsoft.com/lakehouse_name.Lakehouse/Tables/customers"
)
# Uses ADLS Gen2 protocol (abfss://) — Databricks treats OneLake like any ADLS account

OneLake File Explorer (Windows Desktop)

A Windows app that syncs OneLake data to your local file system:

Installing and Using

  1. Download from Microsoft Store → search “OneLake File Explorer”
  2. Install → sign in with Azure AD
  3. OneLake appears as a drive in File Explorer (like OneDrive)
  4. Browse: OneLake → workspace → lakehouse → Files/Tables
  5. Copy files to/from OneLake by dragging and dropping
File Explorer:
  OneLake - Contoso
    └── DataEng_Prod
          ├── bronze_lakehouse
          │     ├── Files
          │     │     └── uploads (drag CSV here to upload!)
          │     └── Tables
          │           ├── raw_customers
          │           └── raw_orders
          └── silver_lakehouse
                └── Tables
                      └── customers_clean

Use for: Quick file uploads, browsing table structures, downloading small files for local analysis.

Multi-Cloud Shortcuts

OneLake shortcuts make external data (S3, GCS) appear as if it is local:

OneLake Lakehouse:
  Tables/
    local_customers/ (actual Delta files in OneLake)
    aws_events/      ← SHORTCUT to s3://company-events/processed/
    gcp_analytics/   ← SHORTCUT to gs://analytics-bucket/reports/

One notebook query:
  SELECT * FROM local_customers c
  JOIN aws_events e ON c.id = e.customer_id
  JOIN gcp_analytics g ON c.id = g.customer_id

Three clouds. One query. Zero data movement.

Shortcut Caching

For cross-cloud shortcuts, enable caching to avoid repeated egress fees:

First read:  OneLake → S3 (egress fee) → data + cached locally
Second read: OneLake → local cache (no egress fee) → instant

Enable: Workspace settings → OneLake → Cache setting → On

OneLake Data Hub

The Data Hub is a searchable catalog of all data items across all workspaces:

  1. Click OneLake data hub in the left sidebar
  2. Browse or search for items: “customers,” “sales,” “dim_product”
  3. See: item name, workspace, type, owner, endorsement status
  4. Click to explore or create a shortcut to the item

Use for: Discovering data created by other teams without asking “where is the customer table?”

Storage Billing and Optimization

What Counts as Storage

Stored Item Billed? Location
Lakehouse Tables (Delta) ✅ Yes OneLake
Lakehouse Files (CSV, JSON) ✅ Yes OneLake
Warehouse tables ✅ Yes OneLake
KQL Database data ✅ Yes OneLake
Shortcut target data ❌ No (billed at source) S3, GCS, ADLS
Shortcut cache ✅ Yes OneLake
Delta log files ✅ Yes OneLake
Old Delta versions (before VACUUM) ✅ Yes OneLake
Mirrored database replicas ✅ Yes OneLake

Soft Delete and Recovery

OneLake retains deleted data for a recovery period:

Delete a table → data soft-deleted → recoverable for retention period
After retention → permanently deleted → storage freed

Configure retention: Workspace settings → OneLake → Soft delete retention

Storage Optimization Tips

  1. Run VACUUM on Delta tables — old versions consume storage. VACUUM table RETAIN 168 HOURS removes versions older than 7 days.
  2. Use shortcuts instead of copies — if data exists in ADLS, create a shortcut instead of copying to OneLake.
  3. Delete staging data after processing — staging tables (bronze) that have been transformed to silver do not need to persist forever.
  4. Compress before uploading — Parquet/Delta are already compressed. CSVs are not — convert to Delta after landing.
  5. Monitor storage — Fabric Admin Portal shows storage usage per workspace.

Real-World OneLake Patterns

Pattern 1: Centralized Data Lake

OneLake
  └── Workspace: Central_DataLake
        ├── bronze_lakehouse (all raw data lands here)
        ├── silver_lakehouse (all cleaned data)
        ├── gold_warehouse (star schema for all teams)
        └── All teams access via Viewer role or shortcuts

Pattern 2: Hub-and-Spoke

OneLake
  ├── Workspace: DataEng_Hub (shared data)
  │     ├── silver_lakehouse (master clean data)
  │     └── gold_warehouse (shared star schema)
  │
  ├── Workspace: Sales_Spoke (sales team)
  │     └── sales_lakehouse
  │           Tables/customers ← SHORTCUT to Hub silver
  │
  ├── Workspace: Marketing_Spoke (marketing team)
  │     └── marketing_lakehouse
  │           Tables/customers ← SHORTCUT to Hub silver
  │
  └── Each spoke reads from Hub via shortcuts — no data duplication

Pattern 3: Multi-Cloud Unified Lake

OneLake
  └── Workspace: Unified_Analytics
        ├── lakehouse
        │     Tables/
        │       azure_crm   ← SHORTCUT to ADLS Gen2
        │       aws_events  ← SHORTCUT to Amazon S3
        │       gcp_logs    ← SHORTCUT to Google Cloud Storage
        │       local_dims  ← Local Delta tables
        │
        └── All queryable together in one notebook or SQL endpoint

Common Mistakes

  1. Creating separate storage accounts alongside OneLake — OneLake IS your storage. Do not create ADLS Gen2 accounts for Fabric data. Use OneLake directly.

  2. Not using shortcuts for shared data — copying data between workspaces wastes storage and creates staleness. Use internal shortcuts.

  3. Not running VACUUM — old Delta versions accumulate. A table with 1 GB of current data can have 10 GB of old versions. VACUUM regularly.

  4. Not exploring the Data Hub — teams duplicate work because they do not know other teams’ data exists. The Data Hub makes all data discoverable.

  5. Ignoring ADLS Gen2 compatibility — existing ADLS tools (Storage Explorer, AzCopy, Python SDK) work with OneLake. Do not build custom connectors when standard tools work.

Interview Questions

Q: What is OneLake and how does it differ from ADLS Gen2? A: OneLake is Fabric’s built-in, unified storage layer — one per tenant, automatic provisioning, Azure AD authentication, integrated with all Fabric workloads. ADLS Gen2 is a standalone Azure service requiring manual provisioning and configuration. OneLake is built on ADLS Gen2 technology and implements the same REST API, so existing ADLS tools work with OneLake.

Q: How can you access OneLake from outside Fabric? A: OneLake implements the ADLS Gen2 API at onelake.dfs.fabric.microsoft.com. Access via Azure Storage Explorer, AzCopy, ADLS Gen2 Python SDK, Databricks (abfss:// protocol), or OneLake File Explorer (Windows desktop app). Any tool that supports ADLS Gen2 works with OneLake by changing the endpoint URL.

Q: What are OneLake shortcuts and why are they important? A: Shortcuts are pointers to data in other locations (other workspaces, ADLS Gen2, S3, GCS) that appear as local tables in your Lakehouse. They eliminate data duplication — multiple teams access the same data through shortcuts without copying. Cross-cloud shortcuts enable multi-cloud analytics from a single query.

Wrapping Up

OneLake is the invisible foundation that makes Fabric work. Every Lakehouse, Warehouse, KQL Database, and Semantic Model writes to OneLake. ADLS Gen2 API compatibility means existing tools work seamlessly. Shortcuts unify data across workspaces and clouds. And the Data Hub makes everything discoverable.

Related posts:OneLake ShortcutsFabric FoundationsFabric Lakehouse GuideADLS Gen2 Guide


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link