Delta Lake Table Properties: Every TBLPROPERTIES Setting, Retention, Change Data Feed, Column Mapping, Auto-Optimize, and Managing Delta Tables Like a Production Engineer

Delta Lake Table Properties: Every TBLPROPERTIES Setting, Retention, Change Data Feed, Column Mapping, Auto-Optimize, and Managing Delta Tables Like a Production Engineer

You know how to CREATE Delta tables, MERGE into them, OPTIMIZE them, and VACUUM them. But there is a configuration layer that controls HOW these operations behave — Table Properties (TBLPROPERTIES). These settings control how long deleted files are retained, whether change tracking is enabled, how schema evolution works, and whether writes are automatically optimized.

Without the right table properties, VACUUM might delete files you still need for time travel. Schema evolution might fail when you try to rename a column. Change Data Feed might not be available when your Materialized Lake View needs it. These silent misconfigurations cause production issues that are hard to diagnose.

Think of table properties like the settings on a washing machine. The machine (Delta table) works without changing settings — but the defaults might not be right for your load. Delicates (sensitive data with long retention) need different settings than towels (staging data you can discard quickly). TBLPROPERTIES are the dials that customize each table’s behavior.

Table of Contents

  • What Are Table Properties?
  • Viewing Current Properties
  • Setting and Removing Properties
  • File Retention Properties
  • deletedFileRetentionDuration
  • logRetentionDuration
  • How Retention Interacts with VACUUM
  • Change Data Feed (CDF)
  • Enabling CDF
  • Reading the Change Feed
  • CDF for MLVs and Streaming
  • Column Mapping Properties
  • Enabling Column Rename and Drop
  • Column Mapping Modes
  • Auto-Optimization Properties
  • autoOptimize.optimizeWrite
  • autoOptimize.autoCompact
  • delta.targetFileSize
  • Protocol Version Properties
  • minReaderVersion and minWriterVersion
  • Feature-Based Protocol
  • Checkpoint Properties
  • checkpoint.writeStatsAsJson
  • checkpoint.writeStatsAsStruct
  • Data Skipping Properties
  • dataSkippingNumIndexedCols
  • dataSkippingStatsColumns
  • Merge-Specific Properties
  • schema.autoMerge.enabled
  • All Table Properties Reference (Quick Lookup)
  • Real-World Configurations
  • Configuration 1: Gold Dimension Table (SCD Type 2)
  • Configuration 2: Bronze Staging Table (Short-Lived)
  • Configuration 3: Silver Cleaned Table (Standard)
  • Configuration 4: Streaming Target Table
  • Configuration 5: Large Fact Table (Billions of Rows)
  • Table Properties in Fabric vs Databricks
  • Common Mistakes
  • Interview Questions
  • Wrapping Up

What Are Table Properties?

Table properties are key-value metadata settings stored WITH the Delta table. They control the behavior of Delta operations (VACUUM, time travel, schema evolution, optimization) at the TABLE level — independent of Spark session settings.

-- Spark session setting (applies to ALL tables in this session):
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true")

-- Table property (applies to THIS TABLE only, permanently):
ALTER TABLE gold.fact_sales SET TBLPROPERTIES ('delta.autoOptimize.optimizeWrite' = 'true')

Session settings are temporary (gone when notebook ends). Table properties are permanent (stored in the Delta log, apply to every writer).

Viewing Current Properties

%%sql
-- View ALL properties of a table
SHOW TBLPROPERTIES gold.dim_customer;

-- View a specific property
SHOW TBLPROPERTIES gold.dim_customer ('delta.deletedFileRetentionDuration');

-- Describe extended (shows properties + location + format + stats)
DESCRIBE EXTENDED gold.dim_customer;

-- Describe detail (file count, size, partitioning)
DESCRIBE DETAIL gold.dim_customer;
# PySpark
props = spark.sql("SHOW TBLPROPERTIES gold.dim_customer").collect()
for row in props:
    print(f"  {row['key']:50s} = {row['value']}")

Setting and Removing Properties

%%sql
-- Set a single property
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 30 days');

-- Set multiple properties at once
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 30 days',
    'delta.logRetentionDuration' = 'interval 60 days',
    'delta.enableChangeDataFeed' = 'true',
    'delta.autoOptimize.optimizeWrite' = 'true',
    'delta.autoOptimize.autoCompact' = 'true'
);

-- Remove (unset) a property (reverts to default)
ALTER TABLE gold.dim_customer
UNSET TBLPROPERTIES ('delta.autoOptimize.autoCompact');

-- Remove IF EXISTS (no error if property does not exist)
ALTER TABLE gold.dim_customer
UNSET TBLPROPERTIES IF EXISTS ('delta.some.nonexistent.property');

File Retention Properties

deletedFileRetentionDuration

Controls how long VACUUM keeps deleted data files. These are the old Parquet files from UPDATE, DELETE, and MERGE operations.

-- Default: 7 days (interval 168 hours)
-- Meaning: VACUUM will NOT delete files newer than 7 days

-- Set to 30 days (keep more history for time travel)
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 30 days');

-- Set to 1 day (staging table — discard quickly, save storage)
ALTER TABLE staging.stg_orders
SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 1 day');

-- After setting, schedule VACUUM:
VACUUM gold.dim_customer;  -- Uses the table's retention setting

Without this property: VACUUM defaults to 7 days. If you need time travel beyond 7 days, you MUST increase this.

logRetentionDuration

Controls how long Delta transaction LOG files (JSON files in _delta_log/) are kept. These enable DESCRIBE HISTORY and time travel.

-- Default: 30 days
-- Set to 90 days (keep 90 days of version history)
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 90 days');

-- Set to 7 days (staging — minimal history needed)
ALTER TABLE staging.stg_orders
SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 7 days');

How Retention Interacts with VACUUM

deletedFileRetentionDuration = 30 days
logRetentionDuration = 90 days

Day 1:   UPDATE changes 1000 rows → old files marked for deletion
Day 15:  Time travel to Day 1 → ✅ WORKS (files still exist, within 30 days)
Day 31:  VACUUM runs → old files DELETED (past 30-day retention)
Day 32:  Time travel to Day 1 → ❌ FAILS (files were vacuumed)
Day 32:  DESCRIBE HISTORY → ✅ Shows Day 1 entry (log kept for 90 days)
Day 91:  DESCRIBE HISTORY → Day 1 entry removed from log

The rule: deletedFileRetentionDuration controls DATA file retention (for time travel queries). logRetentionDuration controls LOG file retention (for DESCRIBE HISTORY). Set data retention ≤ log retention.

Real-life analogy: deletedFileRetentionDuration is how long you keep old documents in a filing cabinet before shredding. logRetentionDuration is how long you keep the logbook of what was filed and shredded. You can see the logbook entry (“File X was shredded on Day 31”) even after the file itself is gone.

Change Data Feed (CDF)

Enabling CDF

Change Data Feed tracks INSERT, UPDATE, and DELETE changes to a table, making them queryable:

-- Enable CDF on a table
ALTER TABLE silver.customers_clean
SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');

-- Enable at table creation time
CREATE TABLE silver.orders_clean (
    order_id INT,
    customer_id INT,
    amount DECIMAL(10,2),
    order_date DATE
) USING DELTA
TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');

Reading the Change Feed

# Read all changes since a specific version
changes = spark.read.format("delta")     .option("readChangeFeed", "true")     .option("startingVersion", 5)     .table("silver.customers_clean")

changes.show()
# Columns include: _change_type (insert, update_preimage, update_postimage, delete),
#                   _commit_version, _commit_timestamp

# Read changes between timestamps
changes = spark.read.format("delta")     .option("readChangeFeed", "true")     .option("startingTimestamp", "2026-06-01")     .option("endingTimestamp", "2026-06-05")     .table("silver.customers_clean")

# Filter by change type
inserts = changes.filter(col("_change_type") == "insert")
updates = changes.filter(col("_change_type") == "update_postimage")
deletes = changes.filter(col("_change_type") == "delete")

CDF for MLVs and Streaming

-- MLVs use CDF for efficient incremental refresh
-- Without CDF: MLV compares full snapshots (slow)
-- With CDF: MLV reads only the change feed (fast)

-- Enable on ALL source tables for MLVs:
ALTER TABLE bronze.raw_customers SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');
ALTER TABLE bronze.raw_orders SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true');
# Streaming with CDF (process only changes)
df_changes = spark.readStream.format("delta")     .option("readChangeFeed", "true")     .table("silver.customers_clean")
# Only new INSERTs, UPDATEs, DELETEs flow through — not the entire table

Column Mapping Properties

Enabling Column Rename and Drop

By default, Delta Lake does NOT support renaming or dropping columns. Column Mapping enables this:

-- Enable column mapping (required for rename/drop)
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES (
    'delta.columnMapping.mode' = 'name',
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5'
);

-- Now you can RENAME columns
ALTER TABLE gold.dim_customer RENAME COLUMN first_name TO FirstName;

-- And DROP columns
ALTER TABLE gold.dim_customer DROP COLUMN middle_name;

Column Mapping Modes

Mode What It Does When to Use
none (default) Columns matched by position (ordinal) Standard tables, no rename/drop needed
name Columns matched by name When you need to rename or drop columns
id Columns matched by internal ID Advanced — used internally

Important: Once you enable column mapping (name mode), you CANNOT go back to none. It is a one-way change. Also requires upgrading the protocol version (minReaderVersion=2, minWriterVersion=5).

Auto-Optimization Properties

autoOptimize.optimizeWrite

Automatically optimizes file sizes during writes — fewer, larger files:

ALTER TABLE gold.fact_sales
SET TBLPROPERTIES ('delta.autoOptimize.optimizeWrite' = 'true');

-- Every write to this table now produces optimized file sizes
-- Without: writes might create many small files (1MB each)
-- With: writes create fewer, larger files (128MB-256MB each)

autoOptimize.autoCompact

Automatically compacts small files after writes:

ALTER TABLE gold.fact_sales
SET TBLPROPERTIES ('delta.autoOptimize.autoCompact' = 'true');

-- After each write, Delta checks if there are too many small files
-- If yes, automatically runs a mini-OPTIMIZE in the background

delta.targetFileSize

Control the target file size for OPTIMIZE:

-- Default: 256MB (Fabric) or 1GB (Databricks)
ALTER TABLE gold.fact_sales
SET TBLPROPERTIES ('delta.targetFileSize' = '128mb');
-- OPTIMIZE will target 128MB files instead of the default

Guidance: 128MB-256MB for tables read by Direct Lake. 256MB-1GB for large tables read by Spark.

Protocol Version Properties

Protocol versions control which Delta features are available:

-- Check current protocol
DESCRIBE DETAIL gold.dim_customer;
-- Shows: minReaderVersion, minWriterVersion

-- Upgrade protocol (IRREVERSIBLE!)
ALTER TABLE gold.dim_customer
SET TBLPROPERTIES (
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5'
);
Feature Required Reader Required Writer
Basic Delta (read, write, MERGE) 1 2
Column mapping (rename, drop) 2 5
Change Data Feed 1 4
Identity columns 1 6
Deletion vectors 3 7

Warning: Protocol upgrades are IRREVERSIBLE. Older Spark versions that do not support the new protocol cannot read the table. Only upgrade when you need the specific feature.

Schema Auto-Merge Properties

-- Enable automatic schema merge on write
ALTER TABLE silver.customers_clean
SET TBLPROPERTIES ('delta.schema.autoMerge.enabled' = 'true');

-- Now writes with new columns automatically add them to the schema:
-- Source has columns: [id, name, email, phone]  (phone is new!)
-- Target has columns: [id, name, email]
-- With autoMerge: phone column added automatically
-- Without autoMerge: write FAILS with schema mismatch error

All Table Properties Reference (Quick Lookup)

Property Default What It Controls
delta.deletedFileRetentionDuration interval 7 days How long VACUUM keeps old data files
delta.logRetentionDuration interval 30 days How long transaction log entries are kept
delta.enableChangeDataFeed false Track INSERT/UPDATE/DELETE changes
delta.columnMapping.mode none Enable column rename/drop (name mode)
delta.autoOptimize.optimizeWrite false Optimize file sizes during writes
delta.autoOptimize.autoCompact false Auto-compact small files after writes
delta.targetFileSize 256mb Target file size for OPTIMIZE
delta.schema.autoMerge.enabled false Auto-add new columns on write
delta.minReaderVersion 1 Minimum reader protocol version
delta.minWriterVersion 2 Minimum writer protocol version
delta.dataSkippingNumIndexedCols 32 Number of columns with data skipping stats
delta.checkpoint.writeStatsAsJson true Write column stats in checkpoint
delta.checkpoint.writeStatsAsStruct true Write structured stats in checkpoint

Real-World Configurations

Configuration 1: Gold Dimension Table (SCD Type 2)

ALTER TABLE gold.dim_customer SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 30 days',  -- 30 days time travel
    'delta.logRetentionDuration' = 'interval 90 days',           -- 90 days version history
    'delta.enableChangeDataFeed' = 'true',                       -- Track changes for MLVs
    'delta.autoOptimize.optimizeWrite' = 'true',                 -- Clean writes
    'delta.autoOptimize.autoCompact' = 'true',                   -- Auto-compact
    'delta.columnMapping.mode' = 'name',                         -- Allow schema evolution
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5'
);

Configuration 2: Bronze Staging Table (Short-Lived)

ALTER TABLE bronze.raw_orders SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 1 day',    -- Minimal retention
    'delta.logRetentionDuration' = 'interval 7 days',            -- Minimal history
    'delta.autoOptimize.optimizeWrite' = 'true',                 -- Clean writes
    'delta.schema.autoMerge.enabled' = 'true'                    -- Source schema may change
);
-- VACUUM bronze.raw_orders; -- Schedule daily to save storage

Configuration 3: Silver Cleaned Table (Standard)

ALTER TABLE silver.customers_clean SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 14 days',  -- 2 weeks time travel
    'delta.logRetentionDuration' = 'interval 30 days',           -- Standard history
    'delta.enableChangeDataFeed' = 'true',                       -- Enable for downstream MLVs
    'delta.autoOptimize.optimizeWrite' = 'true',                 -- Clean writes
    'delta.autoOptimize.autoCompact' = 'true'                    -- Auto-compact
);

Configuration 4: Streaming Target Table

ALTER TABLE silver.realtime_events SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 7 days',
    'delta.logRetentionDuration' = 'interval 14 days',
    'delta.enableChangeDataFeed' = 'true',                       -- For downstream consumers
    'delta.autoOptimize.optimizeWrite' = 'true',                 -- Critical for streaming (many small writes)
    'delta.autoOptimize.autoCompact' = 'true',                   -- Compact the many small files from streaming
    'delta.targetFileSize' = '128mb'                              -- Smaller target for faster compaction
);

Configuration 5: Large Fact Table (Billions of Rows)

ALTER TABLE gold.fact_sales SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 7 days',   -- Standard (storage is expensive at scale)
    'delta.logRetentionDuration' = 'interval 30 days',
    'delta.autoOptimize.optimizeWrite' = 'true',
    'delta.targetFileSize' = '512mb',                            -- Larger files for fewer file listings
    'delta.dataSkippingNumIndexedCols' = '10'                    -- Limit stats to key columns (faster writes)
);
-- Schedule weekly: OPTIMIZE gold.fact_sales ZORDER BY (date_key, customer_key);
-- Schedule weekly: VACUUM gold.fact_sales;

Table Properties in Fabric vs Databricks

Property Fabric Databricks Difference
delta.autoOptimize.optimizeWrite Supported Supported Same
delta.autoOptimize.autoCompact Supported Supported Same
delta.enableChangeDataFeed Supported Supported Same
delta.columnMapping.mode Supported Supported Same
V-Order (spark.sql.parquet.vorder.enabled) Fabric only Not available Fabric-specific Spark config
Liquid Clustering Not available Databricks only Databricks replacement for Z-ORDER
Deletion Vectors Supported (preview) Supported (GA) Databricks more mature
Predictive Optimization Not available Databricks only Auto-OPTIMIZE/VACUUM in Databricks

Common Mistakes

  1. Not setting retention before VACUUM — running VACUUM with default 7-day retention deletes files you might need for time travel. Set deletedFileRetentionDuration FIRST, then schedule VACUUM.

  2. Enabling CDF after the fact — CDF only tracks changes AFTER it is enabled. It does not retroactively capture historical changes. Enable CDF on source tables BEFORE building MLVs or streaming consumers.

  3. Protocol upgrade without understanding — upgrading minReaderVersion/minWriterVersion is IRREVERSIBLE. Older tools that do not support the new protocol cannot read the table. Only upgrade when you need a specific feature (column mapping, CDF).

  4. Not enabling autoOptimize for streaming tables — streaming writes create many tiny files (one per micro-batch). Without autoOptimize + autoCompact, table performance degrades rapidly. Always enable both for streaming targets.

  5. Same retention for all tables — Gold dimension tables need 30-90 days for audit trails. Bronze staging tables need 1-7 days. Set retention per table based on its purpose.

  6. Forgetting schema.autoMerge for evolving sources — external data sources (APIs, vendor feeds) change schema frequently. Without autoMerge, writes fail when a new column appears. Enable on Bronze tables that receive external data.

Interview Questions

Q: What is the difference between deletedFileRetentionDuration and logRetentionDuration? A: deletedFileRetentionDuration controls how long VACUUM keeps old data files (Parquet) — these enable time travel queries to read historical data. logRetentionDuration controls how long transaction log entries (JSON in _delta_log) are kept — these enable DESCRIBE HISTORY. Data file retention should be ≤ log retention. Default: 7 days for files, 30 days for logs.

Q: What is Change Data Feed (CDF) and when should you enable it? A: CDF tracks every INSERT, UPDATE, and DELETE as queryable change records. Enable it on tables that serve as sources for Materialized Lake Views (MLVs use CDF for efficient incremental refresh), streaming consumers (readStream with readChangeFeed), and downstream incremental pipelines. CDF only captures changes AFTER enablement — it is not retroactive.

Q: What is column mapping and when do you need it? A: Column mapping (delta.columnMapping.mode = 'name') changes how Delta maps DataFrame columns to Parquet file columns — by name instead of position. Required for renaming columns (ALTER TABLE RENAME COLUMN) and dropping columns (ALTER TABLE DROP COLUMN). Requires protocol upgrade (minReaderVersion=2, minWriterVersion=5). The upgrade is irreversible.

Q: How do autoOptimize.optimizeWrite and autoOptimize.autoCompact work together? A: optimizeWrite produces optimally-sized files during writes (fewer, larger files). autoCompact runs a background mini-OPTIMIZE after writes to compact any remaining small files. Together they maintain optimal file sizes without manual OPTIMIZE scheduling. Essential for streaming tables that produce many small files per micro-batch.

Q: How should you configure a Bronze table differently from a Gold table? A: Bronze: short retention (1-7 days), autoMerge enabled (schema may change), minimal history. Gold: longer retention (30-90 days), CDF enabled (for MLVs and audit), autoOptimize enabled, column mapping enabled (for schema evolution). Bronze is disposable staging; Gold is the curated, governed output.

Wrapping Up

Table properties are the fine-tuning dials of Delta Lake. The defaults work for basic use, but production tables need intentional configuration — shorter retention for staging (save storage), longer retention for Gold (audit compliance), CDF for downstream consumers, autoOptimize for streaming targets, and column mapping for evolving schemas.

Set these properties ONCE per table when creating it, and you prevent a category of production issues — failed time travel, schema mismatch errors, bloated storage, and slow reads from fragmented files.

Related posts:Delta Lake Deep DiveDelta Lake OptimizationMaterialized Lake ViewsFabric Lakehouse GuideSpark Structured StreamingOptimization Guide


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link