Unified Full Load and Incremental Load Pipeline in Azure Synapse: One Pipeline, One Config Table, Two Load Strategies
In earlier posts, we built two separate pipelines — a full load pipeline that copies entire tables every run, and an incremental pipeline that copies only new or changed rows using a watermark. Both work perfectly for their use case.
But in production, you rarely have ALL tables needing full load or ALL tables needing incremental load. You have a MIX — some tables are small reference data that you dump entirely (full load), and others are large transaction tables where you only need today’s new rows (incremental). Running two separate pipelines means two schedules, two monitoring dashboards, two sets of error handling.
This post builds a single unified pipeline that handles BOTH strategies. A config table tells the pipeline which tables to process and HOW to process each one — full or incremental. Adding a new table means inserting ONE row. Switching a table from full to incremental means updating ONE column. The pipeline never changes.
Think of it like a restaurant that serves both dine-in and takeout from the same kitchen. The order slip (config table) has a column that says “DINE-IN” or “TAKEOUT.” The kitchen manager (ForEach) reads each slip and routes it to the right station — plating for dine-in (full load) or boxing for takeout (incremental). Same kitchen, same chef, same ingredients — different handling based on one flag.
Table of Contents
- Why One Pipeline Instead of Two
- The Architecture
- The Unified Config Table (CONFIGTABLE_V2)
- The Audit Log Table (PIPELINE_LOG)
- The Stored Procedures
- Building the Pipeline Step by Step
- Datasets
- Lookup: Read the Config Table
- ForEach: Loop Through All Tables
- If Condition: Route by LoadType
- TRUE Branch: Full Load Path
- FALSE Branch: Incremental Load Path
- The Complete Pipeline Diagram
- First Run Walkthrough
- Second Run (Incremental Proof)
- Adding a New Table (Zero Pipeline Changes)
- Switching a Table from Full to Incremental
- Date-Partitioned Output Folders
- Common Errors and Fixes
- Interview Questions
- Wrapping Up
Why One Pipeline Instead of Two
| Two Pipelines | One Unified Pipeline |
|---|---|
| Two schedules to manage | One schedule |
| Two monitoring dashboards | One dashboard |
| Two sets of error handling | One audit log |
| Adding a table = edit the right pipeline | Adding a table = one INSERT into config |
| Switching load type = move table between pipelines | Switching = UPDATE one column |
| 10+ activities duplicated across pipelines | All logic in one place |
The key insight: The only difference between full and incremental is the SOURCE QUERY. Full load reads everything (SELECT *). Incremental reads only rows above the watermark (WHERE ProductID > 295). Everything else — the ForEach, the sink, the audit logging — is identical.
The Architecture
Azure SQL Database ADLS Gen2
┌──────────────────────┐ ┌──────────────────────┐
│ │ │ synapse-workspace/ │
│ CONFIGTABLE_V2 │─── Lookup_Config ──> │ sqldb/ │
│ Customer FULL │ │ Customer/ │
│ Address FULL │ │ 2026/05/10/... │
│ Product INCR │ │ Address/ │
│ ProdCat INCR │ │ Product/ │
│ CustAddr INCR │ │ ProductCategory/ │
│ │ ForEach │ CustomerAddress/ │
│ SalesLT.Customer │──┐ │ │
│ SalesLT.Address │ │ If Condition │ │
│ SalesLT.Product │ │ FULL? ──> Copy_FullLoad │ │
│ SalesLT.ProductCat │ │ INCR? ──> Lookup_Max │ │
│ SalesLT.CustAddr │ │ Copy_DeltaData │ │
│ │ │ Update_Watermark│ │
│ PIPELINE_LOG │<─┘ │ │
│ (audit trail) │ Log_Success / Log_Failure │ │
└──────────────────────┘ └──────────────────────┘
The Unified Config Table (CONFIGTABLE_V2)
CREATE TABLE CONFIGTABLE_V2 (
ID INT IDENTITY(1,1),
TableName VARCHAR(100),
SchemaName VARCHAR(50),
FolderName VARCHAR(100),
LoadType VARCHAR(20), -- 'FULL' or 'INCREMENTAL'
LastLoadedValue VARCHAR(100), -- NULL for full load tables
DeltaColumnName VARCHAR(100) -- NULL for full load tables
);
-- 2 Full Load tables (small reference data)
INSERT INTO CONFIGTABLE_V2 VALUES
('Customer', 'SalesLT', 'sqldb/Customer', 'FULL', NULL, NULL),
('Address', 'SalesLT', 'sqldb/Address', 'FULL', NULL, NULL);
-- 3 Incremental Load tables (large/growing data)
INSERT INTO CONFIGTABLE_V2 VALUES
('Product', 'SalesLT', 'sqldb/Product', 'INCREMENTAL', '0', 'ProductID'),
('ProductCategory', 'SalesLT', 'sqldb/ProductCategory', 'INCREMENTAL', '0', 'ProductCategoryID'),
('CustomerAddress', 'SalesLT', 'sqldb/CustomerAddress', 'INCREMENTAL', '0', 'CustomerID');
Why These Columns
| Column | Purpose | Full Load | Incremental |
|---|---|---|---|
TableName |
Which table to copy | Customer | Product |
SchemaName |
Source schema | SalesLT | SalesLT |
FolderName |
ADLS destination folder | sqldb/Customer | sqldb/Product |
LoadType |
The routing flag | FULL | INCREMENTAL |
LastLoadedValue |
Watermark — last value copied | NULL (not used) | ‘0’ → ‘295’ → ‘300’ |
DeltaColumnName |
Which column is the watermark | NULL (not used) | ‘ProductID’ |
Real-life analogy: The config table is like a shipping manifest. Each row is a package. The LoadType column is like the shipping method — “STANDARD” (full load, ship everything) vs “EXPRESS” (incremental, ship only what is new since last shipment). The LastLoadedValue is the tracking number of the last EXPRESS shipment.
The Audit Log Table (PIPELINE_LOG)
CREATE TABLE PIPELINE_LOG (
ID INT IDENTITY(1,1),
TableName VARCHAR(100),
SchemaName VARCHAR(50),
LoadType VARCHAR(20),
RowsRead INT,
RowsCopied INT,
CopyDuration INT,
Status VARCHAR(20),
ErrorMessage VARCHAR(500),
LoadDate DATETIME
);
Every table copy — success or failure, full or incremental — gets logged here. Morning health check:
SELECT TableName, LoadType, RowsCopied, Status, ErrorMessage, LoadDate
FROM PIPELINE_LOG
WHERE CAST(LoadDate AS DATE) = CAST(GETDATE() AS DATE)
ORDER BY LoadDate;
The Stored Procedures
Logging Procedure
CREATE PROC usp_insert_pipeline_log
@TableName VARCHAR(100),
@SchemaName VARCHAR(50),
@LoadType VARCHAR(20),
@RowsRead INT,
@RowsCopied INT,
@CopyDuration INT,
@Status VARCHAR(20),
@ErrorMessage VARCHAR(500)
AS
BEGIN
INSERT INTO PIPELINE_LOG
(TableName, SchemaName, LoadType, RowsRead, RowsCopied,
CopyDuration, Status, ErrorMessage, LoadDate)
VALUES
(@TableName, @SchemaName, @LoadType, @RowsRead, @RowsCopied,
@CopyDuration, @Status, @ErrorMessage, GETDATE());
END;
Watermark Update Procedure (Incremental Only)
CREATE PROC usp_update_watermark_v2
@TableName VARCHAR(100),
@LastLoadedValue VARCHAR(100)
AS
BEGIN
UPDATE CONFIGTABLE_V2
SET LastLoadedValue = @LastLoadedValue
WHERE TableName = @TableName;
END;
Why a separate watermark procedure? Full load tables do not have watermarks. This procedure is only called in the incremental branch. Keeping it separate ensures full load tables are never accidentally updated.
Building the Pipeline Step by Step
Datasets
DS_ConfigTable_V2 — Points to dbo.CONFIGTABLE_V2. Non-parameterized.
DS_SourceTable_Dynamic — Parameterized source dataset.
– Parameters: SchemaName (String), TableName (String)
– Connection: Schema = @dataset().SchemaName, Table = @dataset().TableName
DS_ADLS_Sink_Parquet_V2 — Parameterized sink dataset.
– Parameters: FolderName (String)
– Connection: Container = synapse-workspace, Directory = @dataset().FolderName
Lookup: Read the Config Table
- Drag Lookup activity
- Name:
Lookup_Config - Settings:
- Dataset:
DS_ConfigTable_V2 - First row only: UNCHECKED
- Query:
SELECT * FROM CONFIGTABLE_V2
Returns all 5 rows — the pipeline processes each one.
ForEach: Loop Through All Tables
- Drag ForEach activity
- Name:
ForEach_Table - Green arrow from Lookup_Config
- Settings:
- Sequential: checked (safer for audit logging order)
- Items:
@activity('Lookup_Config').output.value
If Condition: Route by LoadType
Inside ForEach (click pencil icon):
- Drag If Condition activity
- Name:
Check_LoadType - Expression:
@equals(item().LoadType, 'FULL')
This evaluates to TRUE for Customer and Address (full load), and FALSE for Product, ProductCategory, and CustomerAddress (incremental).
Real-life analogy: The If Condition is like the fork in a warehouse conveyor belt. Packages tagged “STANDARD” go left to the bulk shipping dock (full load). Packages tagged “EXPRESS” go right to the priority dispatch (incremental). Same conveyor, same scanner, different destinations.
TRUE Branch: Full Load Path
Click the pencil icon on the True box.
Copy_FullLoad:
– Source: DS_SourceTable_Dynamic
– SchemaName: @item().SchemaName
– TableName: @item().TableName
– Sink: DS_ADLS_Sink_Parquet_V2
– FolderName: @concat(item().FolderName, '/', formatDateTime(utcnow(), 'yyyy'), '/', formatDateTime(utcnow(), 'MM'), '/', formatDateTime(utcnow(), 'dd'), '/', formatDateTime(utcnow(), 'HH'), '/', formatDateTime(utcnow(), 'mm'))
Log_Success_Full (green arrow from Copy_FullLoad):
– Stored Procedure: usp_insert_pipeline_log
– Parameters: TableName=@item().TableName, SchemaName=@item().SchemaName, LoadType=FULL, RowsRead=@activity('Copy_FullLoad').output.rowsRead, RowsCopied=@activity('Copy_FullLoad').output.rowsCopied, CopyDuration=@activity('Copy_FullLoad').output.copyDuration, Status=SUCCESS, ErrorMessage=NA
Log_Failure_Full (red arrow from Copy_FullLoad):
– Same stored procedure with RowsRead=0, RowsCopied=0, CopyDuration=0, Status=FAILED, ErrorMessage=@concat('Full load failed for ', item().SchemaName, '.', item().TableName)
FALSE Branch: Incremental Load Path
Click the pencil icon on the False box.
Lookup_MaxValue:
– Drag Lookup activity
– Name: Lookup_MaxValue
– Use query: Query
– Query: @concat('SELECT CAST(MAX(', item().DeltaColumnName, ') AS VARCHAR(100)) AS MaxValue FROM ', item().SchemaName, '.', item().TableName)
– First row only: checked
This gets the MAX value of the watermark column (e.g., MAX(ProductID) = 295).
Copy_DeltaData (green arrow from Lookup_MaxValue):
– Source: DS_SourceTable_Dynamic
– SchemaName: @item().SchemaName
– TableName: @item().TableName
– Use query: Query
– Query: @concat('SELECT * FROM ', item().SchemaName, '.', item().TableName, ' WHERE ', item().DeltaColumnName, ' > ', item().LastLoadedValue)
– Sink: DS_ADLS_Sink_Parquet_V2
– FolderName: same date-partitioned expression as full load
Update_Watermark (green arrow from Copy_DeltaData):
– Stored Procedure: usp_update_watermark_v2
– Parameters: TableName=@item().TableName, LastLoadedValue=@activity('Lookup_MaxValue').output.firstRow.MaxValue
Log_Success_Incremental (green arrow from Update_Watermark):
– Same logging SP with LoadType=INCREMENTAL, RowsRead/Copied/Duration from Copy_DeltaData
Log_Failure_Incremental (red arrow from Copy_DeltaData):
– Same pattern as Log_Failure_Full with LoadType=INCREMENTAL
The Complete Pipeline Diagram
Lookup_Config (read CONFIGTABLE_V2 — 5 rows)
|
v
ForEach_Table (iterate all 5 tables)
|
+-- Check_LoadType: @equals(item().LoadType, 'FULL')
|
+-- TRUE (Full Load: Customer, Address):
| |
| Copy_FullLoad (SELECT * FROM schema.table)
| | |
| +-- (Success) +-- (Failed)
| | Log_Success_Full | Log_Failure_Full
|
+-- FALSE (Incremental: Product, ProductCategory, CustomerAddress):
|
Lookup_MaxValue (MAX(DeltaColumn))
|
Copy_DeltaData (WHERE DeltaCol > LastLoadedValue)
| |
+-- (Success) +-- (Failed)
| Update_Watermark | Log_Failure_Incremental
| Log_Success_Incr
First Run Walkthrough
Before Run
SELECT TableName, LoadType, LastLoadedValue FROM CONFIGTABLE_V2;
| TableName | LoadType | LastLoadedValue |
|---|---|---|
| Customer | FULL | NULL |
| Address | FULL | NULL |
| Product | INCREMENTAL | 0 |
| ProductCategory | INCREMENTAL | 0 |
| CustomerAddress | INCREMENTAL | 0 |
Run the Pipeline (Debug)
Iteration 1: Customer (FULL)
– Check_LoadType: ‘FULL’ == ‘FULL’ → TRUE
– Copy_FullLoad: SELECT * FROM SalesLT.Customer → 847 rows
– Log: SUCCESS, FULL, 847 rows
Iteration 2: Address (FULL) – TRUE → Copy_FullLoad → 450 rows → Log SUCCESS
Iteration 3: Product (INCREMENTAL)
– Check_LoadType: ‘INCREMENTAL’ == ‘FULL’ → FALSE
– Lookup_MaxValue: SELECT MAX(ProductID) FROM SalesLT.Product → 680
– Copy_DeltaData: SELECT * FROM SalesLT.Product WHERE ProductID > 0 → 295 rows (first run copies all)
– Update_Watermark: LastLoadedValue = ‘680’
– Log: SUCCESS, INCREMENTAL, 295 rows
Iteration 4: ProductCategory (INCREMENTAL) – FALSE → Lookup_MaxValue → Copy_DeltaData → Update_Watermark → Log
Iteration 5: CustomerAddress (INCREMENTAL) – FALSE → Same flow → Log
After Run
-- Config table — watermarks updated
SELECT TableName, LoadType, LastLoadedValue FROM CONFIGTABLE_V2;
| TableName | LoadType | LastLoadedValue |
|---|---|---|
| Customer | FULL | NULL |
| Address | FULL | NULL |
| Product | INCREMENTAL | 680 |
| ProductCategory | INCREMENTAL | 41 |
| CustomerAddress | INCREMENTAL | 29485 |
-- Audit log — all 5 tables logged
SELECT TableName, LoadType, RowsCopied, Status FROM PIPELINE_LOG ORDER BY LoadDate;
| TableName | LoadType | RowsCopied | Status |
|---|---|---|---|
| Customer | FULL | 847 | SUCCESS |
| Address | FULL | 450 | SUCCESS |
| Product | INCREMENTAL | 295 | SUCCESS |
| ProductCategory | INCREMENTAL | 41 | SUCCESS |
| CustomerAddress | INCREMENTAL | 417 | SUCCESS |
Second Run (Incremental Proof)
Run the pipeline again WITHOUT changing any source data:
Full load tables (Customer, Address): Copy ALL rows again (847, 450). That is full load — always copies everything.
Incremental tables (Product, ProductCategory, CustomerAddress): Copy ZERO rows. The watermark has not changed. WHERE ProductID > 680 returns no rows. This proves incremental is working — only new data above the watermark gets copied.
Now add new data:
INSERT INTO SalesLT.Product (Name, ProductNumber, Color, ListPrice, ProductCategoryID, SellStartDate)
VALUES ('Test Laptop', 'TP-001', 'Black', 999.99, 5, GETDATE());
Run again: – Product: 1 row copied (only the new laptop). Watermark updated to 681. – All other tables: same as before.
Adding a New Table (Zero Pipeline Changes)
Your team asks: “Can you also load the SalesOrderHeader table?”
INSERT INTO CONFIGTABLE_V2 VALUES
('SalesOrderHeader', 'SalesLT', 'sqldb/SalesOrderHeader', 'INCREMENTAL', '0', 'SalesOrderID');
Done. Run the pipeline. It now processes 6 tables. You did not touch the pipeline, datasets, or activities. One INSERT, zero code changes.
Switching a Table from Full to Incremental
The Address table grew to 10 million rows. Full load is too slow.
UPDATE CONFIGTABLE_V2
SET LoadType = 'INCREMENTAL',
LastLoadedValue = '0',
DeltaColumnName = 'AddressID'
WHERE TableName = 'Address';
Done. Next run, Address uses incremental logic. One UPDATE, zero pipeline changes.
Real-life analogy: Switching a table’s load type is like changing a delivery address on an order. The warehouse process does not change. The shipping label changes. The package goes to the new address.
Date-Partitioned Output Folders
Both branches use this expression for the sink FolderName:
@concat(
item().FolderName, '/',
formatDateTime(utcnow(), 'yyyy'), '/',
formatDateTime(utcnow(), 'MM'), '/',
formatDateTime(utcnow(), 'dd'), '/',
formatDateTime(utcnow(), 'HH'), '/',
formatDateTime(utcnow(), 'mm')
)
This produces: sqldb/Customer/2026/05/10/14/30/
Each run creates a new folder. Previous runs are preserved.
Common Errors and Fixes
| Error | Cause | Fix |
|---|---|---|
| “ORDER BY invalid in subquery” | Using ORDER BY in Lookup_MaxValue query | Do not use ORDER BY. MAX() does not need it |
| “Column DeltaColumnName not found” | DeltaColumnName is NULL for full load table, but pipeline tries to use it | This only happens if the If Condition is misconfigured — NULL tables should go to TRUE branch |
| “Cannot CAST NULL to VARCHAR” | MAX() returns NULL when table is empty | Add ISNULL(MAX(...), '0') to the Lookup query |
| Copy succeeds but 0 rows | Watermark already at max value | Expected for incremental when no new data exists |
| “dataset() in Copy sink” | Using @dataset().FolderName in Copy activity instead of @item() |
Use @item().FolderName in the Copy activity. @dataset() is for inside the dataset Connection tab only |
| “Activity not found” error in expressions | Typo in activity name (case-sensitive) | Match exact name: Copy_FullLoad not copy_fullload |
| Watermark not updating | Update_Watermark SP using wrong column | Verify SP uses WHERE TableName = @TableName |
Interview Questions
Q: Why use one pipeline for both full and incremental instead of two separate pipelines? A: One pipeline means one schedule, one monitoring dashboard, one audit log, and one set of error handling. Adding tables requires only a config table INSERT. Switching load types requires only a config UPDATE. Two pipelines double the management overhead without any technical benefit.
Q: How does the If Condition route tables to the right load strategy?
A: The expression @equals(item().LoadType, 'FULL') evaluates each config row. TRUE routes to the full load branch (Copy entire table). FALSE routes to the incremental branch (Lookup max watermark, Copy only new rows above watermark, Update watermark). The config table drives the routing — not the pipeline logic.
Q: What happens when an incremental table has no new data? A: The Copy_DeltaData activity runs with a WHERE clause that returns zero rows (watermark unchanged). Zero rows are copied, zero bytes transferred, and the watermark stays the same. The audit log records SUCCESS with RowsCopied=0. This is expected behavior — not an error.
Q: How do you switch a table from full load to incremental? A: One UPDATE statement on the config table: set LoadType to INCREMENTAL, set LastLoadedValue to the starting watermark value (e.g., ‘0’), and set DeltaColumnName to the watermark column name. No pipeline changes needed. Next run, the If Condition routes it to the incremental branch.
Q: What is the watermark pattern and how does it work?
A: A watermark is the last value copied from a monotonically increasing column (ID or timestamp). The pipeline stores this in the config table. On the next run, it queries WHERE column > watermark to get only new rows. After copying, it updates the watermark to the new MAX value. This ensures each row is copied exactly once.
Wrapping Up
This unified pipeline is the production pattern you will use in every data platform. One pipeline, one config table, two load strategies, complete audit logging. Adding a table is an INSERT. Switching strategies is an UPDATE. The pipeline is finished — you never touch it again.
The beauty of this design is separation of concerns: the PIPELINE handles orchestration and error handling. The CONFIG TABLE handles business rules (which tables, which strategy). Change the business rules without changing the pipeline. Change the pipeline without changing the business rules.
That is metadata-driven architecture. That is production-grade data engineering.
Related posts: – SCD Type 1 Full Load Pipeline – Incremental Data Loading with Watermark – Metadata-Driven Pipeline in ADF – ADF Expressions Guide – Parameterized Datasets – Audit Logging Concepts
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.