Delta Lake Table Properties: Every TBLPROPERTIES Setting, Retention, Change Data Feed, Column Mapping, Auto-Optimize, and Managing Delta Tables Like a Production Engineer

You know how to CREATE Delta tables, MERGE into them, OPTIMIZE them, and VACUUM them. But there is a configuration layer that controls HOW these operations behave — Table Properties (TBLPROPERTIES). These settings control how long deleted files are retained, whether change tracking is enabled, how schema evolution works, and whether writes are automatically optimized.

Without the right table properties, VACUUM might delete files you still need for time travel. Schema evolution might fail when you try to rename a column. Change Data Feed might not be available when your Materialized Lake View needs it. These silent misconfigurations cause production issues that are hard to diagnose.

Think of table properties like the settings on a washing machine. The machine (Delta table) works without changing settings — but the defaults might not be right for your load. Delicates (sensitive data with long retention) need different settings than towels (staging data you can discard quickly). TBLPROPERTIES are the dials that customize each table’s behavior.

What Are Table Properties?
Viewing Current Properties
Setting and Removing Properties
File Retention Properties
deletedFileRetentionDuration
logRetentionDuration
How Retention Interacts with VACUUM
Change Data Feed (CDF)
Enabling CDF
Reading the Change Feed
CDF for MLVs and Streaming
Column Mapping Properties
Enabling Column Rename and Drop
Column Mapping Modes
Auto-Optimization Properties
autoOptimize.optimizeWrite
autoOptimize.autoCompact
delta.targetFileSize
Protocol Version Properties
minReaderVersion and minWriterVersion
Feature-Based Protocol
Checkpoint Properties
checkpoint.writeStatsAsJson
checkpoint.writeStatsAsStruct
Data Skipping Properties
dataSkippingNumIndexedCols
dataSkippingStatsColumns
Merge-Specific Properties
schema.autoMerge.enabled
All Table Properties Reference (Quick Lookup)
Real-World Configurations
Configuration 1: Gold Dimension Table (SCD Type 2)
Configuration 2: Bronze Staging Table (Short-Lived)
Configuration 3: Silver Cleaned Table (Standard)
Configuration 4: Streaming Target Table
Configuration 5: Large Fact Table (Billions of Rows)
Table Properties in Fabric vs Databricks
Common Mistakes
Interview Questions
Wrapping Up

What Are Table Properties?

Table properties are key-value metadata settings stored WITH the Delta table. They control the behavior of Delta operations (VACUUM, time travel, schema evolution, optimization) at the TABLE level — independent of Spark session settings.

-- Spark session setting (applies to ALL tables in this session):
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true")

-- Table property (applies to THIS TABLE only, permanently):
ALTER TABLE gold.fact_sales SET TBLPROPERTIES ('delta.autoOptimize.optimizeWrite' = 'true')

Session settings are temporary (gone when notebook ends). Table properties are permanent (stored in the Delta log, apply to every writer).

Viewing Current Properties

%%sql
-- View ALL properties of a table
SHOW TBLPROPERTIES gold.dim_customer;

-- View a specific property
SHOW TBLPROPERTIES gold.dim_customer ('delta.deletedFileRetentionDuration');

-- Describe extended (shows properties + location + format + stats)
DESCRIBE EXTENDED gold.dim_customer;

-- Describe detail (file count, size, partitioning)
DESCRIBE DETAIL gold.dim_customer;

# PySpark
props = spark.sql("SHOW TBLPROPERTIES gold.dim_customer").collect()
for row in props:
    print(f"  {row['key']:50s} = {row['value']}")

Setting and Removing Properties

%%sql
-- Set a single property
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 30 days');

-- Set multiple properties at once
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 30 days',
    'delta.logRetentionDuration' = 'interval 60 days',
    'delta.enableChangeDataFeed' = 'true',
    'delta.autoOptimize.optimizeWrite' = 'true',
    'delta.autoOptimize.autoCompact' = 'true'
);

-- Remove (unset) a property (reverts to default)
ALTER TABLE gold.dim_customer
UNSET TBLPROPERTIES ('delta.autoOptimize.autoCompact');

-- Remove IF EXISTS (no error if property does not exist)
ALTER TABLE gold.dim_customer
UNSET TBLPROPERTIES IF EXISTS ('delta.some.nonexistent.property');

File Retention Properties

deletedFileRetentionDuration

Controls how long VACUUM keeps deleted data files. These are the old Parquet files from UPDATE, DELETE, and MERGE operations.

-- Default: 7 days (interval 168 hours)
-- Meaning: VACUUM will NOT delete files newer than 7 days

-- Set to 30 days (keep more history for time travel)
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 30 days');

-- Set to 1 day (staging table — discard quickly, save storage)
ALTER TABLE staging.stg_orders
SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 1 day');

-- After setting, schedule VACUUM:
VACUUM gold.dim_customer;  -- Uses the table's retention setting

Without this property: VACUUM defaults to 7 days. If you need time travel beyond 7 days, you MUST increase this.

logRetentionDuration

Controls how long Delta transaction LOG files (JSON files in _delta_log/) are kept. These enable DESCRIBE HISTORY and time travel.

-- Default: 30 days
-- Set to 90 days (keep 90 days of version history)
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 90 days');

-- Set to 7 days (staging — minimal history needed)
ALTER TABLE staging.stg_orders
SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 7 days');

How Retention Interacts with VACUUM

deletedFileRetentionDuration = 30 days
logRetentionDuration = 90 days

Day 1:   UPDATE changes 1000 rows → old files marked for deletion
Day 15:  Time travel to Day 1 → ✅ WORKS (files still exist, within 30 days)
Day 31:  VACUUM runs → old files DELETED (past 30-day retention)
Day 32:  Time travel to Day 1 → ❌ FAILS (files were vacuumed)
Day 32:  DESCRIBE HISTORY → ✅ Shows Day 1 entry (log kept for 90 days)
Day 91:  DESCRIBE HISTORY → Day 1 entry removed from log

The rule: deletedFileRetentionDuration controls DATA file retention (for time travel queries). logRetentionDuration controls LOG file retention (for DESCRIBE HISTORY). Set data retention ≤ log retention.

Real-life analogy: deletedFileRetentionDuration is how long you keep old documents in a filing cabinet before shredding. logRetentionDuration is how long you keep the logbook of what was filed and shredded. You can see the logbook entry (“File X was shredded on Day 31”) even after the file itself is gone.

Change Data Feed (CDF)

Enabling CDF

Change Data Feed tracks INSERT, UPDATE, and DELETE changes to a table, making them queryable:

-- Enable CDF on a table
ALTER TABLE silver.customers_clean
SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');

-- Enable at table creation time
CREATE TABLE silver.orders_clean (
    order_id INT,
    customer_id INT,
    amount DECIMAL(10,2),
    order_date DATE
) USING DELTA
TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');

Reading the Change Feed

# Read all changes since a specific version
changes = spark.read.format("delta")     .option("readChangeFeed", "true")     .option("startingVersion", 5)     .table("silver.customers_clean")

changes.show()
# Columns include: _change_type (insert, update_preimage, update_postimage, delete),
#                   _commit_version, _commit_timestamp

# Read changes between timestamps
changes = spark.read.format("delta")     .option("readChangeFeed", "true")     .option("startingTimestamp", "2026-06-01")     .option("endingTimestamp", "2026-06-05")     .table("silver.customers_clean")

# Filter by change type
inserts = changes.filter(col("_change_type") == "insert")
updates = changes.filter(col("_change_type") == "update_postimage")
deletes = changes.filter(col("_change_type") == "delete")

CDF for MLVs and Streaming

-- MLVs use CDF for efficient incremental refresh
-- Without CDF: MLV compares full snapshots (slow)
-- With CDF: MLV reads only the change feed (fast)

-- Enable on ALL source tables for MLVs:
ALTER TABLE bronze.raw_customers SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');
ALTER TABLE bronze.raw_orders SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');

# Streaming with CDF (process only changes)
df_changes = spark.readStream.format("delta")     .option("readChangeFeed", "true")     .table("silver.customers_clean")
# Only new INSERTs, UPDATEs, DELETEs flow through — not the entire table

Column Mapping Properties

Enabling Column Rename and Drop

By default, Delta Lake does NOT support renaming or dropping columns. Column Mapping enables this:

-- Enable column mapping (required for rename/drop)
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES (
    'delta.columnMapping.mode' = 'name',
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5'
);

-- Now you can RENAME columns
ALTER TABLE gold.dim_customer RENAME COLUMN first_name TO FirstName;

-- And DROP columns
ALTER TABLE gold.dim_customer DROP COLUMN middle_name;

Column Mapping Modes

Mode	What It Does	When to Use
none (default)	Columns matched by position (ordinal)	Standard tables, no rename/drop needed
name	Columns matched by name	When you need to rename or drop columns
id	Columns matched by internal ID	Advanced — used internally

Important: Once you enable column mapping (name mode), you CANNOT go back to none. It is a one-way change. Also requires upgrading the protocol version (minReaderVersion=2, minWriterVersion=5).

Auto-Optimization Properties

autoOptimize.optimizeWrite

Automatically optimizes file sizes during writes — fewer, larger files:

ALTER TABLE gold.fact_sales
SET TBLPROPERTIES ('delta.autoOptimize.optimizeWrite' = 'true');

-- Every write to this table now produces optimized file sizes
-- Without: writes might create many small files (1MB each)
-- With: writes create fewer, larger files (128MB-256MB each)

autoOptimize.autoCompact

Automatically compacts small files after writes:

ALTER TABLE gold.fact_sales
SET TBLPROPERTIES ('delta.autoOptimize.autoCompact' = 'true');

-- After each write, Delta checks if there are too many small files
-- If yes, automatically runs a mini-OPTIMIZE in the background

delta.targetFileSize

Control the target file size for OPTIMIZE:

-- Default: 256MB (Fabric) or 1GB (Databricks)
ALTER TABLE gold.fact_sales
SET TBLPROPERTIES ('delta.targetFileSize' = '128mb');
-- OPTIMIZE will target 128MB files instead of the default

Guidance: 128MB-256MB for tables read by Direct Lake. 256MB-1GB for large tables read by Spark.

Protocol Version Properties

Protocol versions control which Delta features are available:

minReaderVersion and minWriterVersion

-- Check current protocol
DESCRIBE DETAIL gold.dim_customer;
-- Shows: minReaderVersion, minWriterVersion

-- Upgrade protocol (IRREVERSIBLE!)
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES (
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5'
);

-- Check what version a table is using:
SHOW TBLPROPERTIES gold.dim_customer ('delta.minReaderVersion');
SHOW TBLPROPERTIES gold.dim_customer ('delta.minWriterVersion');

Feature-Based Protocol

Each Delta feature requires a minimum protocol version. You must upgrade the protocol BEFORE using the feature:

Feature	Required Reader	Required Writer
Basic Delta (read, write, MERGE)	1	2
Column mapping (rename, drop)	2	5
Change Data Feed	1	4
Identity columns	1	6
Deletion vectors	3	7

Warning: Protocol upgrades are IRREVERSIBLE. Older Spark versions that do not support the new protocol cannot read the table. Only upgrade when you need the specific feature. A table at reader=1/writer=2 can be read by any Delta Lake version. A table at reader=3/writer=7 requires Delta Lake 3.0+ to read.

Checkpoint Properties

Checkpoint files are periodic snapshots of the Delta transaction log. Instead of reading thousands of small JSON log files, Spark reads one checkpoint file — much faster. These properties control what statistics are written in checkpoint files:

checkpoint.writeStatsAsJson

Controls whether column statistics (min, max, null count per column per file) are written in JSON format in checkpoint files. Default: true.

-- Check current setting
SHOW TBLPROPERTIES gold.fact_sales ('delta.checkpoint.writeStatsAsJson');

-- These stats enable DATA SKIPPING:
-- Query: SELECT * FROM fact_sales WHERE date_key = 20260609
-- Spark reads checkpoint → sees file_001 has date_key min=20260101, max=20260331
-- Spark SKIPS file_001 entirely (no matching rows possible)
-- Result: reads 1 file instead of 1000 files → 1000x faster

-- Leave enabled (default) for production tables
ALTER TABLE gold.fact_sales
SET TBLPROPERTIES ('delta.checkpoint.writeStatsAsJson' = 'true');

checkpoint.writeStatsAsStruct

Controls whether statistics are ALSO written in a structured (Parquet) format in checkpoint files. Default: true.

-- Structured stats are FASTER to read than JSON stats
-- because Parquet is columnar — Spark reads only the stats columns it needs

-- Both should be enabled for optimal query performance:
ALTER TABLE gold.fact_sales
SET TBLPROPERTIES (
    'delta.checkpoint.writeStatsAsJson' = 'true',
    'delta.checkpoint.writeStatsAsStruct' = 'true'
);

-- Only disable if checkpoint writes are too slow on very wide tables (200+ columns)
-- and you've confirmed data skipping is not needed

Data Skipping Properties

Data skipping uses the min/max statistics stored in checkpoint files to skip entire Parquet files that cannot contain matching rows. These properties control which columns have statistics collected:

dataSkippingNumIndexedCols

Controls how many columns have statistics collected. Default: 32 (first 32 columns in the table schema).

-- Default: stats collected on first 32 columns
SHOW TBLPROPERTIES gold.fact_sales ('delta.dataSkippingNumIndexedCols');

-- For a WIDE table (100+ columns), collecting stats on all columns adds write overhead
-- Reduce to cover only columns used in WHERE clauses and JOINs:
ALTER TABLE gold.fact_sales
SET TBLPROPERTIES ('delta.dataSkippingNumIndexedCols' = '10');

-- For a NARROW table (< 20 columns), default 32 covers everything — no change needed

-- Rule of thumb:
-- Set this to the number of columns you commonly filter on
-- date_key, customer_key, product_key, status = 4 columns
-- dataSkippingNumIndexedCols = 10 (gives headroom for new filter columns)

dataSkippingStatsColumns

Explicitly specify WHICH columns should have statistics collected, instead of relying on column position. Use when your filter columns are NOT in the first 32 columns:

-- If your filter columns are at position 50+ in a wide table,
-- dataSkippingNumIndexedCols = 32 won't cover them
-- Use dataSkippingStatsColumns to explicitly include them:

-- Note: This property is available in newer Delta Lake versions
-- In Fabric, the combination of dataSkippingNumIndexedCols + OPTIMIZE ZORDER
-- achieves the same effect for most scenarios:
OPTIMIZE gold.fact_sales ZORDER BY (date_key, customer_key);
-- Z-ORDER physically co-locates data by these columns → enables efficient skipping

Merge-Specific Properties

schema.autoMerge.enabled

Controls whether new columns in the source DataFrame are automatically added to the target table during writes and MERGE operations:

-- Enable automatic schema merge on write
ALTER TABLE silver.customers_clean
SET TBLPROPERTIES ('delta.schema.autoMerge.enabled' = 'true');

-- Now writes with new columns automatically add them to the schema:
-- Source has columns: [id, name, email, phone]  (phone is new!)
-- Target has columns: [id, name, email]
-- With autoMerge: phone column added automatically → write succeeds
-- Without autoMerge: write FAILS with AnalysisException (schema mismatch)

-- Best for: Bronze tables receiving data from external sources
-- External APIs, vendor feeds, and third-party systems change schema without warning
-- autoMerge absorbs these changes gracefully

-- NOT recommended for: Gold tables (schema should be tightly controlled)
-- A surprise new column in a Gold dimension table could break downstream reports

# PySpark equivalent (session-level, applies to all writes in this session):
spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

# Table property is better — persists permanently, per-table control:
# ALTER TABLE ... SET TBLPROPERTIES ('delta.schema.autoMerge.enabled' = 'true')

All Table Properties Reference (Quick Lookup)

Property	Default	What It Controls
`delta.deletedFileRetentionDuration`	`interval 7 days`	How long VACUUM keeps old data files
`delta.logRetentionDuration`	`interval 30 days`	How long transaction log entries are kept
`delta.enableChangeDataFeed`	`false`	Track INSERT/UPDATE/DELETE changes
`delta.columnMapping.mode`	`none`	Enable column rename/drop (`name` mode)
`delta.autoOptimize.optimizeWrite`	`false`	Optimize file sizes during writes
`delta.autoOptimize.autoCompact`	`false`	Auto-compact small files after writes
`delta.targetFileSize`	`256mb`	Target file size for OPTIMIZE
`delta.schema.autoMerge.enabled`	`false`	Auto-add new columns on write
`delta.minReaderVersion`	`1`	Minimum reader protocol version
`delta.minWriterVersion`	`2`	Minimum writer protocol version
`delta.dataSkippingNumIndexedCols`	`32`	Number of columns with data skipping stats
`delta.checkpoint.writeStatsAsJson`	`true`	Write column stats in checkpoint (JSON)
`delta.checkpoint.writeStatsAsStruct`	`true`	Write column stats in checkpoint (Parquet)

Real-World Configurations

Configuration 1: Gold Dimension Table (SCD Type 2)

ALTER TABLE gold.dim_customer SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 30 days',  -- 30 days time travel
    'delta.logRetentionDuration' = 'interval 90 days',           -- 90 days version history
    'delta.enableChangeDataFeed' = 'true',                       -- Track changes for MLVs
    'delta.autoOptimize.optimizeWrite' = 'true',                 -- Clean writes
    'delta.autoOptimize.autoCompact' = 'true',                   -- Auto-compact
    'delta.columnMapping.mode' = 'name',                         -- Allow schema evolution
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5'
);

Configuration 2: Bronze Staging Table (Short-Lived)

ALTER TABLE bronze.raw_orders SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 1 day',    -- Minimal retention
    'delta.logRetentionDuration' = 'interval 7 days',            -- Minimal history
    'delta.autoOptimize.optimizeWrite' = 'true',                 -- Clean writes
    'delta.schema.autoMerge.enabled' = 'true'                    -- Source schema may change
);
-- VACUUM bronze.raw_orders; -- Schedule daily to save storage

Configuration 3: Silver Cleaned Table (Standard)

ALTER TABLE silver.customers_clean SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 14 days',  -- 2 weeks time travel
    'delta.logRetentionDuration' = 'interval 30 days',           -- Standard history
    'delta.enableChangeDataFeed' = 'true',                       -- Enable for downstream MLVs
    'delta.autoOptimize.optimizeWrite' = 'true',                 -- Clean writes
    'delta.autoOptimize.autoCompact' = 'true'                    -- Auto-compact
);

Configuration 4: Streaming Target Table

ALTER TABLE silver.realtime_events SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 7 days',
    'delta.logRetentionDuration' = 'interval 14 days',
    'delta.enableChangeDataFeed' = 'true',                       -- For downstream consumers
    'delta.autoOptimize.optimizeWrite' = 'true',                 -- Critical for streaming (many small writes)
    'delta.autoOptimize.autoCompact' = 'true',                   -- Compact the many small files from streaming
    'delta.targetFileSize' = '128mb'                              -- Smaller target for faster compaction
);

Configuration 5: Large Fact Table (Billions of Rows)

ALTER TABLE gold.fact_sales SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 7 days',   -- Standard (storage is expensive at scale)
    'delta.logRetentionDuration' = 'interval 30 days',
    'delta.autoOptimize.optimizeWrite' = 'true',
    'delta.targetFileSize' = '512mb',                            -- Larger files for fewer file listings
    'delta.dataSkippingNumIndexedCols' = '10'                    -- Limit stats to key columns (faster writes)
);
-- Schedule weekly: OPTIMIZE gold.fact_sales ZORDER BY (date_key, customer_key);
-- Schedule weekly: VACUUM gold.fact_sales;

Table Properties in Fabric vs Databricks

Property	Fabric	Databricks	Difference
`delta.autoOptimize.optimizeWrite`	Supported	Supported	Same
`delta.autoOptimize.autoCompact`	Supported	Supported	Same
`delta.enableChangeDataFeed`	Supported	Supported	Same
`delta.columnMapping.mode`	Supported	Supported	Same
V-Order (`spark.sql.parquet.vorder.enabled`)	Fabric only	Not available	Fabric-specific Spark config
Liquid Clustering	Not available	Databricks only	Databricks replacement for Z-ORDER
Deletion Vectors	Supported (preview)	Supported (GA)	Databricks more mature
Predictive Optimization	Not available	Databricks only	Auto-OPTIMIZE/VACUUM in Databricks

Common Mistakes

Not setting retention before VACUUM — running VACUUM with default 7-day retention deletes files you might need for time travel. Set deletedFileRetentionDuration FIRST, then schedule VACUUM.
Enabling CDF after the fact — CDF only tracks changes AFTER it is enabled. It does not retroactively capture historical changes. Enable CDF on source tables BEFORE building MLVs or streaming consumers.
Protocol upgrade without understanding — upgrading minReaderVersion/minWriterVersion is IRREVERSIBLE. Older tools that do not support the new protocol cannot read the table. Only upgrade when you need a specific feature (column mapping, CDF).
Not enabling autoOptimize for streaming tables — streaming writes create many tiny files (one per micro-batch). Without autoOptimize + autoCompact, table performance degrades rapidly. Always enable both for streaming targets.
Same retention for all tables — Gold dimension tables need 30-90 days for audit trails. Bronze staging tables need 1-7 days. Set retention per table based on its purpose.
Forgetting schema.autoMerge for evolving sources — external data sources (APIs, vendor feeds) change schema frequently. Without autoMerge, writes fail when a new column appears. Enable on Bronze tables that receive external data.

Interview Questions

Q: What is the difference between deletedFileRetentionDuration and logRetentionDuration? A: deletedFileRetentionDuration controls how long VACUUM keeps old data files (Parquet) — these enable time travel queries to read historical data. logRetentionDuration controls how long transaction log entries (JSON in _delta_log) are kept — these enable DESCRIBE HISTORY. Data file retention should be ≤ log retention. Default: 7 days for files, 30 days for logs.

Q: What is Change Data Feed (CDF) and when should you enable it? A: CDF tracks every INSERT, UPDATE, and DELETE as queryable change records. Enable it on tables that serve as sources for Materialized Lake Views (MLVs use CDF for efficient incremental refresh), streaming consumers (readStream with readChangeFeed), and downstream incremental pipelines. CDF only captures changes AFTER enablement — it is not retroactive.

Q: What is column mapping and when do you need it? A: Column mapping (delta.columnMapping.mode = 'name') changes how Delta maps DataFrame columns to Parquet file columns — by name instead of position. Required for renaming columns (ALTER TABLE RENAME COLUMN) and dropping columns (ALTER TABLE DROP COLUMN). Requires protocol upgrade (minReaderVersion=2, minWriterVersion=5). The upgrade is irreversible.

Q: How do autoOptimize.optimizeWrite and autoOptimize.autoCompact work together? A: optimizeWrite produces optimally-sized files during writes (fewer, larger files). autoCompact runs a background mini-OPTIMIZE after writes to compact any remaining small files. Together they maintain optimal file sizes without manual OPTIMIZE scheduling. Essential for streaming tables that produce many small files per micro-batch.

Q: How should you configure a Bronze table differently from a Gold table? A: Bronze: short retention (1-7 days), autoMerge enabled (schema may change), minimal history. Gold: longer retention (30-90 days), CDF enabled (for MLVs and audit), autoOptimize enabled, column mapping enabled (for schema evolution). Bronze is disposable staging; Gold is the curated, governed output.

Wrapping Up

Table properties are the fine-tuning dials of Delta Lake. The defaults work for basic use, but production tables need intentional configuration — shorter retention for staging (save storage), longer retention for Gold (audit compliance), CDF for downstream consumers, autoOptimize for streaming targets, and column mapping for evolving schemas.

Set these properties ONCE per table when creating it, and you prevent a category of production issues — failed time travel, schema mismatch errors, bloated storage, and slow reads from fragmented files.

← Previous: Spark Structured Streaming Fabric (22/38) Next: Materialized Lake Views (MLVs) →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Table of Contents