Incremental Data Loading in Azure Synapse: The Delta Copy Pattern with Watermark Tables

Full load works fine when your table has 1,000 rows. But what happens when it has 10 million? Or 100 million?

At 10 million rows, a full copy takes about 20 minutes per table. At 100 million, it’s closer to 2 hours. Now multiply that by 100 tables — you’re looking at 200+ hours of daily pipeline runtime. That’s more than 8 days. Obviously, that doesn’t fit in a 24-hour window.

The solution is incremental loading (also called delta load or change data capture). Instead of copying every row every day, you copy only the rows that changed since the last run. A table with 100 million rows might only have 50,000 new or updated rows per day — that’s a 2,000x reduction in data movement.

In this guide, I’ll walk you through building an incremental load pipeline in Azure Synapse Analytics using the watermark pattern — a config table that tracks the last loaded value for each source table, and a pipeline that uses that value to extract only the delta.

This builds on my previous posts about metadata-driven pipelines in ADF and Synapse pipelines with audit logging. If you’re new to the Lookup → ForEach → Copy pattern, start with those first.

The Problem: Why Full Load Doesn’t Scale
Full Load vs Incremental Load
How the Watermark Pattern Works
Two Types of Watermarks: ID-Based and Date-Based
Step 1: Create the Config Table (Metadata + Watermark)
Step 2: Create Source Tables and Sample Data
Step 3: Create the Stored Procedure for Watermark Updates
Step 4: Create Datasets
Step 5: Build the Pipeline
Step 6: Understanding the Delta Query Expression
Step 7: Configure the Watermark Update
Step 8: Add Date-Partitioned Output Folders
Step 9: First Run — Initial Full Load
Step 10: Test Incremental — Add New Data and Re-Run
The dataset() vs item() Gotcha (Revisited)
Common Errors and Fixes
Full Load vs Incremental at Scale — The Numbers
Production Best Practices
Interview Questions
Wrapping Up

The Problem: Why Full Load Doesn’t Scale

Let me paint the picture with real numbers:

Scenario	Rows per Table	Time per Table	Tables	Total Daily Time
Small	10,000	30 seconds	10	5 minutes
Medium	1,000,000	5 minutes	50	4 hours
Large	10,000,000	20 minutes	100	33 hours
Enterprise	100,000,000	2 hours	100	200+ hours

At the “Large” tier, you’re already over 24 hours — your pipeline literally can’t finish before the next day’s run starts. And that’s before accounting for retries, failures, and other pipelines competing for resources.

Beyond time, full load has other costs: – Storage: You’re writing 100 million identical rows every day, consuming terabytes of duplicate data – Database pressure: Full table scans on the source database impact production queries – Network: Transferring the same unchanged data wastes bandwidth – Cost: More data movement = higher ADF/Synapse pipeline costs

Full Load vs Incremental Load

Aspect	Full Load	Incremental Load
What it copies	Every row, every time	Only changed rows (inserts + updates)
Rows per run (100M table)	100,000,000	~50,000 (0.05%)
Time per table	~2 hours	~1-5 minutes
Storage growth	Massive (full copy daily)	Minimal (only deltas)
Source DB impact	Heavy (full scan)	Light (indexed WHERE clause)
Complexity	Simple	Moderate (needs config table + watermark logic)
When to use	Small tables, reference data, initial loads	Large transactional tables, daily/hourly pipelines

Rule of thumb: Use full load for small reference tables (countries, categories, status codes) and initial loads. Use incremental load for everything else in production.

How the Watermark Pattern Works

The watermark pattern uses a config table that stores the “last loaded value” for each source table. Here’s the flow:

1. Read config table → get LastLoadedValue for each table
2. Query source table → get MAX value of the delta column
3. Copy only rows WHERE DeltaColumn > LastLoadedValue
                    AND DeltaColumn <= CurrentMaxValue
4. After successful copy → update LastLoadedValue = CurrentMaxValue
5. Next run → only picks up rows newer than the updated watermark

Why the upper bound (<= CurrentMaxValue)? Without it, rows inserted between the copy and the watermark update would be missed. The upper bound creates a clean boundary: everything up to MaxValue is copied, and the watermark is set to MaxValue. Any rows inserted after the query runs will have values > MaxValue and will be picked up in the next run.

Two Types of Watermarks: ID-Based and Date-Based

Option A: ID-Based Watermark (Auto-Increment Column)

The watermark tracks the last loaded ID (e.g., EMPID)
Works for INSERT-only tables where IDs always increase
Example: Last loaded EMPID = 103. Next run picks up rows WHERE EMPID > 103
Limitation: Doesn’t detect UPDATES — if you update EMPID 101’s city, the watermark won’t pick it up because 101 < 103

Option B: Date/Timestamp-Based Watermark

The watermark tracks the latest timestamp loaded
Works for both INSERTS and UPDATES (if the table has a modified_date or rcd_upd_ts column)
Example: Last loaded date = 2026-01-01. Next run picks up WHERE ORDERDATE > ‘2026-01-01’
Advantage: When a row is updated, its timestamp changes to the current time, making it “newer” than the watermark

Which to use? If your source table has a modified_date or updated_at column, use date-based — it catches both inserts and updates. If it only has an auto-increment ID, use ID-based (but be aware that updates won’t be captured).

Step 1: Create the Config Table (Metadata + Watermark)

The config table extends the metadata table from the previous tutorials by adding two new columns:

CREATE TABLE configtable (
    id               INT IDENTITY(1, 1),
    TableName        VARCHAR(100),
    SchemaName       VARCHAR(50),
    FolderName       VARCHAR(100),
    LastLoadedValue  VARCHAR(100),
    DeltaColumnName  VARCHAR(100)
);

LastLoadedValue — The watermark value from the last successful run. Starts at '0' for ID-based or '1900-01-01 00:00:00' for date-based.
DeltaColumnName — The name of the column used as the watermark (e.g., EMPID or ORDERDATE).

Why VARCHAR for LastLoadedValue? Because it needs to store both IDs (integers) and dates (timestamps) in the same column. We CAST everything to VARCHAR in the pipeline expressions for consistency.

Insert the config rows:

-- ID-based watermark (EMPLOYEE table, starts at 0)
INSERT INTO configtable (TableName, SchemaName, FolderName, LastLoadedValue, DeltaColumnName)
VALUES ('EMPLOYEE', 'T1', 'T1/EMPLOYEE', '0', 'EMPID');

-- Date-based watermark (ORDERS table, starts at 1900-01-01)
INSERT INTO configtable (TableName, SchemaName, FolderName, LastLoadedValue, DeltaColumnName)
VALUES ('ORDERS', 'T1', 'T1/ORDERS', '1900-01-01 00:00:00', 'ORDERDATE');

-- Verify
SELECT * FROM configtable;

Step 2: Create Source Tables and Sample Data

-- Create a schema to keep things organized
CREATE SCHEMA T1;

-- ID-based table (watermark on EMPID)
CREATE TABLE T1.EMPLOYEE (
    EMPID    BIGINT,
    EMPNAME  VARCHAR(100),
    EMPCITY  VARCHAR(100)
);

INSERT INTO T1.EMPLOYEE VALUES (100, 'Naveen', 'hyd');
INSERT INTO T1.EMPLOYEE VALUES (101, 'Shrey', 'delhi');
INSERT INTO T1.EMPLOYEE VALUES (102, 'Vrushab', 'pune');
INSERT INTO T1.EMPLOYEE VALUES (103, 'Vishnu', 'noida');

-- Date-based table (watermark on ORDERDATE)
CREATE TABLE T1.ORDERS (
    ORDERID     INT,
    ORDERDATE   DATETIME,
    ORDERNUMBER VARCHAR(100)
);

INSERT INTO T1.ORDERS VALUES (1, '2026-01-01 10:30:00', 'B1026');
INSERT INTO T1.ORDERS VALUES (2, '2026-10-01 10:40:00', 'B1046');
INSERT INTO T1.ORDERS VALUES (3, '2026-01-01 11:30:00', 'B1063');

Step 3: Create the Stored Procedure for Watermark Updates

This stored procedure updates the watermark after each successful copy:

CREATE PROCEDURE usp_update_config
    @tablename     VARCHAR(100),
    @LASTLOADVALUE VARCHAR(100)
AS
    UPDATE configtable
    SET LastLoadedValue = @LASTLOADVALUE
    WHERE TableName = @tablename
    AND SchemaName = 'T1';

Why update after the copy, not before? If the copy fails, the watermark stays at the old value. The next run will retry the same data range. If you updated the watermark before the copy and the copy failed, you’d lose those rows forever.

Step 4: Create Datasets

You need three datasets:

A. DS_ConfigTable (Lookup Source)

Type: Azure SQL Database
Table: dbo.configtable
No parameters needed

B. DS_SourceTable_Incremental (Parameterized Source)

Type: Azure SQL Database
Leave table blank
Parameters: SchemaName (String), TableName (String)
Connection tab (Enter manually):
Schema: @dataset().SchemaName
Table: @dataset().TableName

C. DS_ADLS_Sink_Incremental (Parameterized Sink with Date Folders)

Type: ADLS Gen2 → Parquet
Parameters: FolderName (String)
Connection tab:
Container: synapse-workspace (or your container name — can be hardcoded or parameterized)
Directory: @dataset().FolderName

Step 5: Build the Pipeline

Create a new pipeline: PL_Incremental_DeltaLoad

The pipeline has a more complex structure inside the ForEach than our previous pipelines — three activities in sequence:

Lookup_Config
      │ (Success)
      ▼
ForEach_Table
      │
      ├── Lookup_MaxValue
      │       │ (Success)
      │       ▼
      ├── Copy_DeltaData
      │       │ (Success)
      │       ▼
      └── Update_Watermark (Stored Procedure)

A. Lookup_Config

Settings tab:
Source dataset: DS_ConfigTable
First row only: UNCHECKED

B. ForEach_Table

Settings tab:
Items: @activity('Lookup_Config').output.value
Sequential: unchecked

C. Inside ForEach — Lookup_MaxValue

This Lookup queries the source table to find the current maximum value of the delta column:

Drag a Lookup activity inside the ForEach
Name: Lookup_MaxValue
Settings:
Source dataset: DS_SourceTable_Incremental
SchemaName: @item().SchemaName
TableName: @item().TableName
Use query: Query
Query (dynamic content):

@concat('SELECT CAST(MAX(', item().DeltaColumnName, ') AS VARCHAR(100)) AS MaxValue FROM ', item().SchemaName, '.', item().TableName)

Why CAST to VARCHAR? Because the config table stores LastLoadedValue as VARCHAR. We need MaxValue in the same format so the stored procedure can update it consistently, regardless of whether it’s an integer (EMPID) or a datetime (ORDERDATE).

First row only: Leave this CHECKED (default) — we only need one value (the MAX).

D. Inside ForEach — Copy_DeltaData

Drag a Copy data activity
Name: Copy_DeltaData
Draw green arrow from Lookup_MaxValue → Copy_DeltaData

Source tab: – Dataset: DS_SourceTable_Incremental – SchemaName: @item().SchemaName – TableName: @item().TableName – Use query: Query – Query (dynamic content) — this is the key expression:

@concat('SELECT * FROM ', item().SchemaName, '.', item().TableName, ' WHERE CAST(', item().DeltaColumnName, ' AS VARCHAR(100)) > ''', item().LastLoadedValue, ''' AND CAST(', item().DeltaColumnName, ' AS VARCHAR(100)) <= ''', activity('Lookup_MaxValue').output.firstRow.MaxValue, '''')

Sink tab: – Dataset: DS_ADLS_Sink_Incremental – FolderName: @item().FolderName

Step 6: Understanding the Delta Query Expression

The Copy activity’s source query is the most complex expression in this pipeline. Let’s break it down:

@concat(
    'SELECT * FROM ', item().SchemaName, '.', item().TableName,
    ' WHERE CAST(', item().DeltaColumnName, ' AS VARCHAR(100)) > ''',
    item().LastLoadedValue,
    ''' AND CAST(', item().DeltaColumnName, ' AS VARCHAR(100)) <= ''',
    activity('Lookup_MaxValue').output.firstRow.MaxValue,
    ''''
)

For the EMPLOYEE table (first run), this resolves to:

SELECT * FROM T1.EMPLOYEE
WHERE CAST(EMPID AS VARCHAR(100)) > '0'
  AND CAST(EMPID AS VARCHAR(100)) <= '103'

This selects all employees with EMPID between 1 and 103 (the initial load since the watermark starts at 0).

For the ORDERS table (first run), this resolves to:

SELECT * FROM T1.ORDERS
WHERE CAST(ORDERDATE AS VARCHAR(100)) > '1900-01-01 00:00:00'
  AND CAST(ORDERDATE AS VARCHAR(100)) <= '2026-10-01 10:40:00'

This selects all orders since 1900 (effectively all rows for the initial load).

On the second run (after watermark is updated):

-- Only new employees (EMPID > 103)
SELECT * FROM T1.EMPLOYEE
WHERE CAST(EMPID AS VARCHAR(100)) > '103'
  AND CAST(EMPID AS VARCHAR(100)) <= '105'

-- Only new orders (ORDERDATE > '2026-10-01 10:40:00')
SELECT * FROM T1.ORDERS
WHERE CAST(ORDERDATE AS VARCHAR(100)) > '2026-10-01 10:40:00'
  AND CAST(ORDERDATE AS VARCHAR(100)) <= '2026-11-15 09:00:00'

Now only 2 employees and 1 order are copied — not the entire tables.

Why the single-quote escaping (''')? In ADF expressions, a single quote inside a string is escaped with another single quote. So ''' means: end the string, insert a literal single quote, start a new string. This produces the SQL string delimiters around the watermark values.

Step 7: Configure the Watermark Update

After Copy_DeltaData succeeds, update the watermark:

Drag a Stored Procedure activity
Name: Update_Watermark
Draw green arrow (Success) from Copy_DeltaData → Update_Watermark
Settings:
Linked service: LS_AzureSqlDB
Stored procedure: usp_update_config
Parameters:

Parameter	Type	Value
tablename	String	`@item().TableName`
LASTLOADVALUE	String	`@activity('Lookup_MaxValue').output.firstRow.MaxValue`

Critical: We update the watermark to MaxValue (from the Lookup), NOT to the Copy activity’s output. This is because MaxValue was determined before the copy ran — it’s a clean, consistent boundary.

Step 8: Add Date-Partitioned Output Folders

Instead of overwriting the same folder each run, partition by date and time. In the Copy activity’s Sink tab, change the FolderName to:

@concat(item().FolderName, '/', formatDateTime(utcnow(), 'yyyy'), '/', formatDateTime(utcnow(), 'MM'), '/', formatDateTime(utcnow(), 'dd'), '/', formatDateTime(utcnow(), 'HH'), '/', formatDateTime(utcnow(), 'mm'))

This creates a nested folder structure:

synapse-workspace/
└── T1/
    └── EMPLOYEE/
        └── 2026/
            └── 04/
                └── 05/
                    └── 14/
                        └── 30/
                            └── part-00000.snappy.parquet

Why this matters: Each pipeline run creates files in a unique folder based on the execution time. You never overwrite previous runs. You can easily query specific date ranges. And cleanup is straightforward — just delete old date folders.

Important: Put this concat expression in the pipeline’s Copy activity Sink tab (where you pass the FolderName value), NOT in the dataset’s Connection tab. The dataset should only have @dataset().FolderName. The pipeline builds the full path and passes it to the dataset.

Step 9: First Run — Initial Full Load

Click Validate → fix any errors
Click Publish all
Click Debug

After the first run completes, verify:

-- Check the config table — watermarks should be updated
SELECT * FROM configtable;

-- Expected:
-- EMPLOYEE: LastLoadedValue = '103'
-- ORDERS:   LastLoadedValue = 'Oct  1 2026 10:40AM' (or similar format)

Check ADLS Gen2 — you should see Parquet files in the date-partitioned folders under T1/EMPLOYEE/ and T1/ORDERS/.

Step 10: Test Incremental — Add New Data and Re-Run

Add new rows to the source tables:

INSERT INTO T1.EMPLOYEE VALUES (104, 'Ravi', 'bangalore');
INSERT INTO T1.EMPLOYEE VALUES (105, 'Priya', 'chennai');

INSERT INTO T1.ORDERS VALUES (4, '2026-11-15 09:00:00', 'B2001');

Run the pipeline again (Debug).

What should happen: – Only EMPID 104 and 105 are copied (not all 105 rows) – Only ORDERID 4 is copied (not all 4 orders) – New Parquet files appear in a new date-partitioned folder

Verify the updated watermarks:

SELECT * FROM configtable;

-- EMPLOYEE: LastLoadedValue = '105' (was '103')
-- ORDERS:   LastLoadedValue = 'Nov 15 2026  9:00AM' (was 'Oct  1 2026 10:40AM')

This confirms incremental loading is working correctly. The pipeline only copied the delta.

The dataset() vs item() Gotcha (Revisited)

A common error in this pipeline is using @dataset() where you should use @item(), or vice versa:

Error: "The template function 'dataset' is not defined or not valid"

This happens when you put @dataset().FolderName in the pipeline’s Copy activity. Remember the rule:

Where	Use This	Example
Inside a Dataset (Connection tab)	`@dataset()`	`@dataset().FolderName`
Inside a Pipeline (activity settings)	`@item()`	`@item().FolderName`
Inside a Pipeline (referencing another activity)	`@activity()`	`@activity('Lookup_MaxValue').output.firstRow.MaxValue`

The dataset defines the template. The pipeline fills in the values. Never mix them.

Common Errors and Fixes

Error	Cause	Fix
`"dataset function is not defined"`	Used `@dataset()` in the pipeline	Change to `@item()` in the pipeline; use `@dataset()` only in the dataset
`"Lookup activity returned no results"`	Source table is empty or watermark is ahead of all data	Check your LastLoadedValue isn’t greater than the MAX in the source
Case-sensitivity error on `@item().tablename`	Column name doesn’t match exactly	Use `@item().TableName` matching your SQL column exactly
Single quote parsing error in WHERE clause	Missing or extra quotes in the `@concat()` expression	Count your quotes: `'''` = end string + literal quote + start string
Update_Watermark runs after failed Copy	Green arrow (Success) not properly set	Verify the dependency condition is “Succeeded” not “Completed”

Full Load vs Incremental at Scale — The Numbers

Metric	Full Load (Daily)	Incremental Load (Daily)
Rows copied (100M table)	100,000,000	~50,000 (changes only)
Time per table	~2 hours	~1-5 minutes
Time for 100 tables	200+ hours (impossible)	~2-8 hours (feasible)
Daily storage growth	100M rows x 100 tables	50K rows x 100 tables
Source DB pressure	Heavy (full table scans)	Light (indexed WHERE reads)
Implementation effort	Low	Moderate (config table + stored proc)

Bottom line: Full load is simple but doesn’t scale. Incremental load is essential for large transactional tables in production. The watermark pattern (tracking LastLoadedValue in a config table) is the most common and reliable approach in ADF and Synapse.

Production Best Practices

1. Always Update Watermark AFTER Successful Copy

If you update the watermark before the copy and the copy fails, you’ll lose those rows. The stored procedure should only run on the success path (green arrow from Copy).

2. Handle Empty Deltas Gracefully

If no new data exists since the last watermark, Lookup_MaxValue might return NULL. Add a condition:

@if(equals(activity('Lookup_MaxValue').output.firstRow.MaxValue, null), item().LastLoadedValue, activity('Lookup_MaxValue').output.firstRow.MaxValue)

This skips the copy when there’s no new data.

3. Index the Delta Column

Make sure your delta column (EMPID, ORDERDATE) has an index on the source database. Without an index, the WHERE clause in the delta query does a full table scan — defeating the purpose of incremental loading.

CREATE INDEX IX_EMPLOYEE_EMPID ON T1.EMPLOYEE (EMPID);
CREATE INDEX IX_ORDERS_ORDERDATE ON T1.ORDERS (ORDERDATE);

4. Combine with Audit Logging

Add the audit logging from the previous post to track how many delta rows were copied per table per run. This gives you a complete picture of your pipeline’s behavior over time.

5. Monitor Watermark Drift

Set up alerts if a watermark hasn’t moved in X days — it might indicate the source table isn’t getting updates, or the pipeline is silently failing.

Interview Questions

Q: What is the difference between full load and incremental load? A: Full load copies every row every time. Incremental load copies only rows that changed since the last run, using a watermark (like a last-loaded ID or timestamp). Incremental is essential for large tables where full load is too slow or expensive.

Q: What is the watermark pattern in ADF/Synapse? A: It’s a pattern where a config table stores the last loaded value (watermark) for each source table. The pipeline reads the watermark, queries only rows newer than that value, copies them, and then updates the watermark to the new MAX value.

Q: How do you handle both inserts and updates in incremental loading? A: Use a timestamp-based watermark (like modified_date or rcd_upd_ts). When a row is updated, its timestamp changes to the current time, making it newer than the watermark. ID-based watermarks only catch inserts, not updates.

Q: Why do you use CAST(DeltaColumn AS VARCHAR) in the query? A: Because the config table stores LastLoadedValue as VARCHAR (to support both integers and dates in the same column). We CAST the source column to VARCHAR for consistent comparison.

Q: What happens if the pipeline fails midway through a run? A: The watermark only updates after a successful copy (green arrow dependency). If the copy fails, the watermark stays at the old value, and the next run retries the same data range. No data is lost.

Q: How would you handle deletes in incremental loading? A: The watermark pattern doesn’t handle hard deletes. Options include: soft deletes (is_deleted flag), CDC with change tracking features in SQL Server, or periodic full reconciliation runs that compare source and destination.

Wrapping Up

You’ve now built a complete incremental data loading pipeline with:

Config table with watermark tracking (LastLoadedValue + DeltaColumnName)
Dynamic delta queries that only copy changed rows
Watermark updates after successful copies via stored procedure
Date-partitioned output folders for organized data lake storage
Support for both ID-based and date-based watermarks using CAST to VARCHAR

This is how real production data pipelines work. The watermark pattern is simple, reliable, and scales to thousands of tables with millions of rows.

Related posts: – Building a Metadata-Driven Pipeline in Azure Data Factory – Synapse Pipeline with Audit Logging – Building a REST API with FastAPI on AWS Lambda – Top 15 Azure Data Factory Interview Questions (coming soon)

If this guide helped you, share it with a fellow data engineer. Have questions? Drop a comment below.

← Previous: ADF Expressions

Azure (18/37)

Next: Unified Full+Incremental →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Incremental Data Loading in Azure Synapse: The Delta Copy Pattern with Watermark Tables

Incremental Data Loading in Azure Synapse: The Delta Copy Pattern with Watermark Tables

Table of Contents

The Problem: Why Full Load Doesn’t Scale

Full Load vs Incremental Load

How the Watermark Pattern Works

Two Types of Watermarks: ID-Based and Date-Based

Option A: ID-Based Watermark (Auto-Increment Column)

Option B: Date/Timestamp-Based Watermark

Step 1: Create the Config Table (Metadata + Watermark)

Step 2: Create Source Tables and Sample Data

Step 3: Create the Stored Procedure for Watermark Updates

Step 4: Create Datasets

A. DS_ConfigTable (Lookup Source)

B. DS_SourceTable_Incremental (Parameterized Source)

C. DS_ADLS_Sink_Incremental (Parameterized Sink with Date Folders)

Step 5: Build the Pipeline

A. Lookup_Config

B. ForEach_Table

C. Inside ForEach — Lookup_MaxValue

D. Inside ForEach — Copy_DeltaData

Step 6: Understanding the Delta Query Expression

Step 7: Configure the Watermark Update

Step 8: Add Date-Partitioned Output Folders

Step 9: First Run — Initial Full Load

Step 10: Test Incremental — Add New Data and Re-Run

The dataset() vs item() Gotcha (Revisited)

Common Errors and Fixes

Full Load vs Incremental at Scale — The Numbers

Production Best Practices

1. Always Update Watermark AFTER Successful Copy

2. Handle Empty Deltas Gracefully

3. Index the Delta Column

4. Combine with Audit Logging

5. Monitor Watermark Drift

Interview Questions

Wrapping Up

Leave a Comment Cancel Reply

Incremental Data Loading in Azure Synapse: The Delta Copy Pattern with Watermark Tables

Table of Contents

The Problem: Why Full Load Doesn’t Scale

Full Load vs Incremental Load

How the Watermark Pattern Works

Two Types of Watermarks: ID-Based and Date-Based

Option A: ID-Based Watermark (Auto-Increment Column)

Option B: Date/Timestamp-Based Watermark

Step 1: Create the Config Table (Metadata + Watermark)

Step 2: Create Source Tables and Sample Data

Step 3: Create the Stored Procedure for Watermark Updates

Step 4: Create Datasets

A. DS_ConfigTable (Lookup Source)

B. DS_SourceTable_Incremental (Parameterized Source)

C. DS_ADLS_Sink_Incremental (Parameterized Sink with Date Folders)

Step 5: Build the Pipeline

A. Lookup_Config

B. ForEach_Table

C. Inside ForEach — Lookup_MaxValue

D. Inside ForEach — Copy_DeltaData

Step 6: Understanding the Delta Query Expression

Step 7: Configure the Watermark Update

Step 8: Add Date-Partitioned Output Folders

Step 9: First Run — Initial Full Load

Step 10: Test Incremental — Add New Data and Re-Run

The dataset() vs item() Gotcha (Revisited)

Common Errors and Fixes

Full Load vs Incremental at Scale — The Numbers

Production Best Practices

1. Always Update Watermark AFTER Successful Copy

2. Handle Empty Deltas Gracefully

3. Index the Delta Column

4. Combine with Audit Logging

5. Monitor Watermark Drift

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply