Building an SCD Type 1 Pipeline in Azure: Full Load with Metadata, Audit Logging, and Every Activity Explained

In our SCD Types post, we covered the theory — Type 1 overwrites old values with new ones, no history preserved. But theory without practice is like reading a cookbook without ever stepping into the kitchen.

This post is the kitchen. We are going to build a complete SCD Type 1 pipeline from scratch in Azure Synapse (works identically in ADF), and I will explain every single activity, every expression, every dataset parameter, and WHY each piece exists.

By the end, you will have a production-grade pipeline that reads a configuration table, loops through multiple source tables, copies each one to ADLS Gen2 as Parquet, logs success and failure into an audit table, and creates date-partitioned output folders. This is the exact pattern used in real enterprise data platforms.

What Makes This SCD Type 1
The Complete Architecture
The Analogy: A Restaurant Kitchen
Prerequisites: What You Need Before Building
The Metadata Table: The Recipe Book
The Audit Table: The Kitchen Log
The Stored Procedure: The Log Writer
Activity 1: Lookup (The Waiter Taking the Order)
Activity 2: ForEach (The Kitchen Line)
Activity 3: Copy Data (The Chef Cooking)
Activity 4: Log Success (The Quality Check — Passed)
Activity 5: Log Failure (The Quality Check — Failed)
The Three Datasets Explained
Dataset 1: DS_SqlDB_Metadata (The Menu)
Dataset 2: DS_SqlDB_SourceTable (The Ingredient Shelf)
Dataset 3: DS_ADLS_Sink_Parquet (The Serving Plate)
The Linked Services: The Supply Chains
Expression Deep Dive
The Full Pipeline JSON
How Data Flows Through the Pipeline
Adding Date-Partitioned Output Folders
Running and Verifying the Pipeline
What Happens on the Second Run (The SCD Type 1 Behavior)
Evolving to SCD Type 2
Common Errors and Fixes
Interview Questions
Wrapping Up

What Makes This SCD Type 1

SCD Type 1 means overwrite. Every time the pipeline runs, it replaces the destination data with the latest source data. No history. No versioning. The destination always reflects the current state of the source.

Run 1 (Monday):    Source has 100 customers → Destination gets 100 customers
Run 2 (Tuesday):   Source has 102 customers (2 new + Alice moved to Mumbai)
                   → Destination gets 102 customers (Alice shows Mumbai, no Toronto history)
Run 3 (Wednesday): Source has 101 customers (1 deleted)
                   → Destination gets 101 customers (deleted customer is gone)

Real-life analogy: SCD Type 1 is like updating your phone’s GPS address. You type your new address, and the old one disappears completely. Your GPS always shows where you live NOW, but it has no memory of where you used to live.

When this is appropriate: – Bronze layer ingestion (capture the latest source snapshot) – Reference data that is corrected, not versioned (country codes, currency rates) – Staging tables that are truncated and reloaded daily – Any scenario where “what is the data NOW?” is the only question that matters

The Complete Architecture

Azure SQL Database                    ADLS Gen2
┌─────────────────────┐              ┌─────────────────────────┐
│                     │              │ synapse-workspace/       │
│ metadata table      │─── Lookup ──>│   sqldb/                │
│ (config: 5 tables)  │              │     Customer/           │
│                     │              │       2026/04/16/14/30/  │
│ SalesLT.Customer    │─── Copy ───>│         part-0000.parquet│
│ SalesLT.Address     │─── Copy ───>│     Address/             │
│ SalesLT.Product     │─── Copy ───>│     Product/             │
│ ...                 │              │     ...                  │
│                     │              │                          │
│ audit table         │<── Log ─────│                          │
│ (success/failure)   │              │                          │
│                     │              │                          │
│ sp_insert_audit_log │              │                          │
│ (stored procedure)  │              │                          │
└─────────────────────┘              └─────────────────────────┘

Pipeline: PL_Copy_SqlToADLS_Parquet_WithAudit
┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  Lookup_Metadata ──> ForEach_Table                           │
│                       │                                      │
│                       ├── Copy_TableData                     │
│                       │    │                                 │
│                       │    ├── (Success) ──> Log_Success     │
│                       │    │                                 │
│                       │    └── (Failure) ──> Log_Failure     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

The Analogy: A Restaurant Kitchen

I will use a restaurant kitchen analogy throughout this post because the pipeline workflow maps perfectly to how a kitchen operates:

Pipeline Component	Kitchen Equivalent	What It Does
Metadata table	The recipe book	Lists all dishes (tables) to prepare
Audit table	The kitchen log	Records what was cooked, how long, any problems
Stored procedure	The log writer (sous chef with clipboard)	Writes entries into the kitchen log
Lookup activity	The waiter taking the order	Reads the recipe book and brings the list to the kitchen
ForEach activity	The kitchen line	Processes each dish in the order
Copy activity	The chef cooking	Does the actual work of preparing each dish
Log_Success	Quality check — passed	“Dish looks great, log it as served”
Log_Failure	Quality check — failed	“Dish burned, log the problem for review”
Linked service	The supplier contracts	Agreements with ingredient suppliers and dish delivery service
Dataset	The ingredient location and serving plate	Where to find ingredients and how to plate the dish

Prerequisites: What You Need Before Building

Before we start building, ensure you have:

Azure SQL Database with AdventureWorksLT sample data (or any database)
ADLS Gen2 storage account with a container (e.g., synapse-workspace or database)
Synapse Workspace (or ADF) with linked services for both SQL and ADLS
Permissions: Managed Identity has Storage Blob Data Contributor on ADLS

The Metadata Table: The Recipe Book

This table drives the entire pipeline. Adding a new table to copy means inserting ONE row here — no pipeline changes needed.

CREATE TABLE metadata (
    Id              INT IDENTITY(100,1),
    TableName       VARCHAR(100),
    SchemaName      VARCHAR(50),
    ContainerName   VARCHAR(100),
    FolderName      VARCHAR(200)
);

INSERT INTO metadata (TableName, SchemaName, ContainerName, FolderName) VALUES
('Customer',        'SalesLT', 'synapse-workspace', 'sqldb/Customer'),
('Address',         'SalesLT', 'synapse-workspace', 'sqldb/Address'),
('CustomerAddress', 'SalesLT', 'synapse-workspace', 'sqldb/CustomerAddress'),
('Product',         'SalesLT', 'synapse-workspace', 'sqldb/Product'),
('ProductCategory', 'SalesLT', 'synapse-workspace', 'sqldb/ProductCategory');

Why metadata-driven? Imagine you have 50 tables to copy. Without metadata, you build 50 separate Copy activities (or worse, 50 pipelines). With metadata, you build ONE pipeline and control everything from a SQL table. Adding table #51 is an INSERT statement, not a pipeline redesign.

Real-life analogy: The metadata table is like a restaurant’s daily specials board. The kitchen (pipeline) does not change. The board (metadata) changes daily. Today we serve 5 dishes, tomorrow we add a 6th by writing it on the board. The kitchen workflow stays the same.

The Audit Table: The Kitchen Log

Every table copy operation — success or failure — gets logged here.

CREATE TABLE audit (
    Id              INT IDENTITY(1,1),
    rows_read       INT,
    rows_copied     INT,
    copy_duration   INT,
    table_name      VARCHAR(100),
    schema_name     VARCHAR(100),
    error_message   VARCHAR(500),
    load_date       DATETIME
);

Why audit logging? In production, pipelines run at 2 AM with no one watching. When the business asks “did the Customer table load successfully last night?” you need an answer. The audit table IS the answer.

After each run, you can query:

-- What happened last night?
SELECT table_name, rows_read, rows_copied, copy_duration, error_message, load_date
FROM audit
WHERE CAST(load_date AS DATE) = CAST(GETDATE() AS DATE)
ORDER BY load_date DESC;

Real-life analogy: The audit table is the kitchen log. After every dish, the sous chef notes: “Chicken was prepared at 7:15 PM, took 12 minutes, served to Table 4, no issues.” If a customer complains the next day, the log tells you exactly what happened.

The Stored Procedure: The Log Writer

CREATE PROCEDURE sp_insert_audit_log
    @rows_read      INT,
    @rows_copied    INT,
    @copy_duration  INT,
    @table_name     VARCHAR(100),
    @schema_name    VARCHAR(100),
    @error_message  VARCHAR(500)
AS
BEGIN
    INSERT INTO audit (rows_read, rows_copied, copy_duration, table_name, schema_name, error_message, load_date)
    VALUES (@rows_read, @rows_copied, @copy_duration, @table_name, @schema_name, @error_message, GETDATE());
END;

Why a stored procedure instead of direct INSERT? Three reasons:

Encapsulation — the pipeline calls sp_insert_audit_log without knowing the table structure. If you add a column to the audit table later, you update the SP, not every pipeline.
Reusability — multiple pipelines can call the same SP.
Security — the pipeline needs EXECUTE permission on the SP, not INSERT permission on the table. Principle of least privilege.

Real-life analogy: The stored procedure is the sous chef with a clipboard. The chef (Copy activity) yells “5 rows cooked, 5 served, took 3 seconds, no problems!” The sous chef writes it down in the proper format. The chef does not touch the log book directly.

Activity 1: Lookup (The Waiter Taking the Order)

What It Does

Reads the metadata table and returns an array of rows. Each row describes one table to copy.

How to Configure

Drag Lookup activity onto the canvas
Name: Lookup_Metadata
Settings tab:
Source dataset: DS_SqlDB_Metadata
Use query: Query
Query: SELECT TableName, SchemaName, ContainerName, FolderName FROM metadata
First row only: UNCHECKED (critical! If checked, only 1 table is processed)

What It Outputs

{
    "count": 5,
    "value": [
        {"TableName": "Customer", "SchemaName": "SalesLT", "ContainerName": "synapse-workspace", "FolderName": "sqldb/Customer"},
        {"TableName": "Address", "SchemaName": "SalesLT", "ContainerName": "synapse-workspace", "FolderName": "sqldb/Address"},
        {"TableName": "CustomerAddress", "SchemaName": "SalesLT", "ContainerName": "synapse-workspace", "FolderName": "sqldb/CustomerAddress"},
        {"TableName": "Product", "SchemaName": "SalesLT", "ContainerName": "synapse-workspace", "FolderName": "sqldb/Product"},
        {"TableName": "ProductCategory", "SchemaName": "SalesLT", "ContainerName": "synapse-workspace", "FolderName": "sqldb/ProductCategory"}
    ]
}

Why This Activity Exists

Without Lookup, you would hardcode every table name in the pipeline. Adding a new table means editing the pipeline. With Lookup, the pipeline reads its instructions dynamically from a database table. The pipeline NEVER changes — only the metadata changes.

Real-life analogy: The waiter walks to the specials board (metadata table), reads today’s five dishes, and brings the list back to the kitchen. The kitchen does not decide what to cook — the board does.

Common Mistake: First Row Only

If First row only is checked, the Lookup returns a single object (not an array), and ForEach fails because it expects an array. This is the #1 mistake beginners make.

First row only CHECKED:   {"TableName": "Customer", ...}      ← Single object, ForEach fails
First row only UNCHECKED: {"count": 5, "value": [{...}, ...]} ← Array, ForEach works

Activity 2: ForEach (The Kitchen Line)

What It Does

Iterates over the array from Lookup and executes the inner activities (Copy + Log) for EACH item in the array.

How to Configure

Drag ForEach activity onto the canvas
Name: ForEach_Table
Draw a green arrow from Lookup_Metadata to ForEach_Table
Settings tab:
Sequential: UNCHECKED (runs tables in parallel — faster)
Batch count: 5 (max 5 tables processed simultaneously)
Items: @activity('Lookup_Metadata').output.value

What Happens During Execution

Iteration 1: item() = {"TableName": "Customer", "SchemaName": "SalesLT", ...}
Iteration 2: item() = {"TableName": "Address", "SchemaName": "SalesLT", ...}
Iteration 3: item() = {"TableName": "CustomerAddress", ...}
Iteration 4: item() = {"TableName": "Product", ...}
Iteration 5: item() = {"TableName": "ProductCategory", ...}

Inside ForEach, @item() represents the current row being processed. So @item().TableName is “Customer” in iteration 1, “Address” in iteration 2, etc.

Why This Activity Exists

ForEach is the loop that processes multiple items without duplicating activities. Without it, you would need 5 separate Copy activities, 5 Log_Success activities, and 5 Log_Failure activities. With ForEach, you write the logic ONCE and it executes for each item.

Real-life analogy: The kitchen line. Five order tickets hang on the rail. The kitchen processes each ticket one at a time (sequential) or multiple tickets simultaneously (parallel with batch count). The cooking process is the same for each ticket — only the dish name changes.

Sequential vs Parallel

Setting	Behavior	When to Use
Sequential = checked	Tables copied one at a time in order	When tables have dependencies or the source cannot handle concurrent reads
Sequential = unchecked, Batch = 5	Up to 5 tables copied simultaneously	Default for independent tables — much faster

The Items Expression Explained

@activity('Lookup_Metadata').output.value

Breaking it down:

Part	Meaning
`@activity('Lookup_Metadata')`	Reference the Lookup activity by its exact name
`.output`	Access the activity’s output
`.value`	The array of rows returned by the Lookup

If you mistype the activity name (e.g., 'Lookup_metadata' with lowercase m), the pipeline fails with “Activity not found.”

Activity 3: Copy Data (The Chef Cooking)

What It Does

Copies data from Azure SQL Database to ADLS Gen2 in Parquet format. This is the core work of the pipeline.

How to Configure

Click the pencil icon on ForEach to open the inner canvas
Drag Copy data activity
Name: Copy_TableData

Source tab: – Source dataset: DS_SqlDB_SourceTable – SchemaName: @item().SchemaName – TableName: @item().TableName

Sink tab: – Sink dataset: DS_ADLS_Sink_Parquet – ContainerName: @item().ContainerName – FolderName: @item().FolderName

What Happens During Execution (For Customer Table)

1. Copy reads @item().SchemaName = "SalesLT", @item().TableName = "Customer"
2. These values flow to DS_SqlDB_SourceTable via @dataset().SchemaName and @dataset().TableName
3. The dataset resolves to: SELECT * FROM SalesLT.Customer
4. Copy reads all 847 rows from Azure SQL Database
5. Copy writes to ADLS Gen2 at: synapse-workspace/sqldb/Customer/part-00000.snappy.parquet
6. Copy activity records: rowsRead=847, rowsCopied=847, copyDuration=3 (seconds)

The Parameter Flow (The Most Important Concept)

Metadata table row:  {"TableName": "Customer", "SchemaName": "SalesLT", ...}
          |
          v
ForEach @item():     @item().TableName = "Customer", @item().SchemaName = "SalesLT"
          |
          v
Copy activity:       Passes to dataset: SchemaName="SalesLT", TableName="Customer"
          |
          v
Dataset:             @dataset().SchemaName = "SalesLT", @dataset().TableName = "Customer"
          |
          v
Connection:          Schema: SalesLT, Table: Customer
          |
          v
SQL Query:           SELECT * FROM [SalesLT].[Customer]

This chain — metadata → @item() → dataset parameters → @dataset() → actual query — is the core of every metadata-driven pipeline. Understanding this flow means understanding ADF/Synapse pipeline architecture.

Real-life analogy: The order ticket says “Grilled Salmon.” The chef reads the ticket (@item()), walks to the ingredient shelf (@dataset()), grabs salmon from the section labeled “Fish” (SchemaName) shelf “Salmon” (TableName), cooks it, and plates it on the designated serving plate (sink with FolderName).

Copy Activity Output (Used by Logging)

After successful execution, the Copy activity exposes:

@activity('Copy_TableData').output.rowsRead       → 847
@activity('Copy_TableData').output.rowsCopied      → 847
@activity('Copy_TableData').output.copyDuration    → 3
@activity('Copy_TableData').output.throughput       → 256.5 (KB/s)

After failure:

@activity('Copy_TableData').error.message          → "Table 'SalesLT.Cstomer' not found"

These values feed directly into the audit logging activities.

Activity 4: Log Success (The Quality Check — Passed)

What It Does

When Copy_TableData succeeds, this activity calls the stored procedure to insert a success record into the audit table.

How to Configure

Drag Stored Procedure activity onto the inner canvas
Name: Log_Success
Draw a green arrow (Success) from Copy_TableData to Log_Success
Settings tab:
Linked service: LS_AzureSqlDB
Stored procedure name: sp_insert_audit_log
Parameters:

Name	Type	Value
rows_read	Int32	`@activity('Copy_TableData').output.rowsRead`
rows_copied	Int32	`@activity('Copy_TableData').output.rowsCopied`
copy_duration	Int32	`@activity('Copy_TableData').output.copyDuration`
table_name	String	`@item().TableName`
schema_name	String	`@item().SchemaName`
error_message	String	`NA`

Why the Green Arrow Matters

The green arrow means this activity ONLY runs when the Copy activity succeeds. If Copy fails, Log_Success is skipped entirely, and Log_Failure runs instead.

Copy_TableData
  |
  ├── Green arrow (Succeeded) ──> Log_Success  ✓ Runs only on success
  |
  └── Red arrow (Failed) ──> Log_Failure       ✓ Runs only on failure

What Gets Logged

-- Audit table after successful copy of Customer
INSERT INTO audit VALUES (847, 847, 3, 'Customer', 'SalesLT', 'NA', '2026-04-16 14:30:05');

Activity 5: Log Failure (The Quality Check — Failed)

What It Does

When Copy_TableData fails, this activity logs the error details.

How to Configure

Drag another Stored Procedure activity
Name: Log_Failure
Draw a red arrow (Failure) from Copy_TableData to Log_Failure
Settings tab:
Same linked service and stored procedure
Parameters:

Name	Type	Value
rows_read	Int32	`0`
rows_copied	Int32	`0`
copy_duration	Int32	`0`
table_name	String	`@item().TableName`
schema_name	String	`@item().SchemaName`
error_message	String	`@activity('Copy_TableData').error.message`

Why Both Success and Failure Handlers

Without failure logging, a failed table copy is invisible. The pipeline might show “Succeeded” overall (because ForEach continues with other tables), but one table silently failed. The audit table catches this:

-- Morning health check
SELECT table_name, error_message, load_date
FROM audit
WHERE error_message != 'NA'
  AND CAST(load_date AS DATE) = CAST(GETDATE() AS DATE);

-- Result: ProductCategory | Invalid object name 'SalesLT.ProductCategori' | 2026-04-16 14:30:08

Without audit logging, you would not know ProductCategory failed until someone notices the dashboard showing old data — which could be hours or days later.

Real-life analogy: In a restaurant, if a dish is burned, the sous chef does not just throw it away silently. They LOG it: “Dish: Salmon, Problem: overcooked, Time: 7:30 PM.” The chef can review the log and see patterns: “Salmon burns every Friday night — we need more staff on Fridays.”

The Three Datasets Explained

Purpose: Points to the metadata table. Used by the Lookup activity.

Configuration: – Type: Azure SQL Table – Linked service: LS_AzureSqlDB – Table: dbo.metadata – Parameters: None (fixed table)

Why it is non-parameterized: The metadata table never changes location. It is always dbo.metadata in the same database. No parameters needed.

Dataset 2: DS_SqlDB_SourceTable (The Ingredient Shelf)

Purpose: Points to any source table dynamically. Used by the Copy activity source.

Configuration: – Type: Azure SQL Table – Linked service: LS_AzureSqlDB – Table: left blank (parameterized) – Parameters: – SchemaName (String) – TableName (String) – Connection tab: – Schema: @dataset().SchemaName – Table: @dataset().TableName

Why it is parameterized: This single dataset represents ALL five source tables. During execution, the Copy activity passes different values:

Iteration 1: SchemaName="SalesLT", TableName="Customer"    → reads SalesLT.Customer
Iteration 2: SchemaName="SalesLT", TableName="Address"     → reads SalesLT.Address
Iteration 3: SchemaName="SalesLT", TableName="Product"     → reads SalesLT.Product

Without parameterization, you would need five separate datasets — one per table. With parameterization, one dataset handles them all.

Dataset 3: DS_ADLS_Sink_Parquet (The Serving Plate)

Purpose: Points to the destination in ADLS Gen2. Used by the Copy activity sink.

Configuration: – Type: Parquet – Linked service: naveen-synapse-ws-WorkspaceDefaultStorage (or your ADLS linked service) – Parameters: – ContainerName (String) – FolderName (String) – Connection tab: – File system: @dataset().ContainerName – Directory: @dataset().FolderName – Compression: Snappy

Why Parquet with Snappy? Parquet is columnar (fast analytical queries), and Snappy compression provides the best balance of speed and file size. A 10 MB CSV becomes a 2 MB Parquet file.

The Linked Services: The Supply Chains

LS_AzureSqlDB (SQL Database Connection)

Type: Azure SQL Database
Server: sql-dataplatform-dev.database.windows.net
Database: AdventureWorksLT
Authentication: SQL Authentication (or Managed Identity)
Connect via: AutoResolveIntegrationRuntime

Why it exists: Tells ADF/Synapse HOW to connect to the database. Credentials are stored here, not in datasets or pipelines.

Default Storage (ADLS Gen2 Connection)

Type: Azure Data Lake Storage Gen2
URL: https://naveensynapsedl.dfs.core.windows.net
Authentication: Managed Identity (automatic, no credentials)

Why Managed Identity: No passwords to manage, rotate, or accidentally expose. The Synapse workspace authenticates using its own Azure identity.

Adding Date-Partitioned Output Folders

In production, you want each run’s output in a separate dated folder to avoid overwriting:

Change the FolderName in the Copy sink to:

@concat(item().FolderName, '/',
        formatDateTime(convertTimeZone(utcnow(), 'UTC', 'Eastern Standard Time'), 'yyyy/MM/dd/HH/mm'))

This produces: sqldb/Customer/2026/04/16/14/30/

Each run creates a NEW folder. Previous runs are preserved. This is technically not pure SCD Type 1 (which overwrites), but it is the common production pattern because it allows: – Rollback to previous day’s data if needed – Audit trail of what was loaded when – Delta Lake or downstream pipeline can read the latest partition

Running and Verifying the Pipeline

Debug Run

Click Validate — fix any errors
Click Debug — runs the pipeline immediately
Watch the Output panel at the bottom

Expected Output

Lookup_Metadata        → Succeeded (5 rows)
ForEach_Table          → Succeeded (5 iterations)
  Copy (Customer)      → Succeeded (847 rows read, 847 copied, 3 sec)
  Log_Success          → Succeeded
  Copy (Address)       → Succeeded (450 rows read, 450 copied, 2 sec)
  Log_Success          → Succeeded
  ... (3 more tables)

Verify in ADLS Gen2

Navigate to your container. You should see:

synapse-workspace/
  sqldb/
    Customer/
      part-00000.snappy.parquet
    Address/
      part-00000.snappy.parquet
    CustomerAddress/
      part-00000.snappy.parquet
    Product/
      part-00000.snappy.parquet
    ProductCategory/
      part-00000.snappy.parquet

Verify Audit Table

SELECT table_name, rows_read, rows_copied, copy_duration, error_message, load_date
FROM audit ORDER BY load_date DESC;

Expected: 5 rows, all with error_message = 'NA'.

What Happens on the Second Run (The SCD Type 1 Behavior)

This is the key SCD Type 1 characteristic:

Without date partitions: The second run OVERWRITES the same Parquet files. Old data is replaced. If Alice’s city changed from Toronto to Mumbai in the source, the new Parquet file shows Mumbai. Toronto is gone forever. That is Type 1.

With date partitions: The second run creates NEW files in a new folder (different timestamp). Both versions coexist in storage. A downstream process reads the latest partition. Old partitions can be archived or deleted by a cleanup pipeline.

Evolving to SCD Type 2

When business requirements change from “we only need current data” to “we need historical tracking,” you evolve this pipeline:

Add LoadType column to metadata table (FULL or INCREMENTAL)
Add DeltaColumnName and LastLoadedValue for incremental tables
Add an If Condition inside ForEach to branch between full and incremental logic
For SCD Type 2 target tables, use MERGE (in Delta Lake or stored procedures) instead of overwrite

We covered this evolution in detail in our incremental loading post and SCD types post.

Common Errors and Fixes

Error	Cause	Fix
“First row only” gives single object	ForEach expects array, gets object	Uncheck “First row only” in Lookup
“Activity ‘Lookup_Metadata’ not found”	Typo in activity name in ForEach Items expression	Match exact name including case
“Invalid object name ‘SalesLT.Custmer'”	Typo in metadata table’s TableName	Fix the metadata row: `UPDATE metadata SET TableName='Customer' WHERE TableName='Custmer'`
“Forbidden” writing to ADLS	Managed Identity lacks permission	Add Storage Blob Data Contributor role to Synapse MI
Copy succeeds but 0 rows	Source table is empty or query returns nothing	Check source data: `SELECT COUNT(*) FROM SalesLT.Customer`
Log_Failure not running	Copy succeeded (green path)	Log_Failure only runs on red arrow (failure). This is correct.
Both Log_Success AND Log_Failure run	Impossible by design	Check arrow colors: green to Success, red to Failure

Interview Questions

Q: What is a metadata-driven pipeline? A: A pipeline that reads its configuration (which tables to copy, where to write) from a database table instead of hardcoding. Adding or removing tables requires updating the metadata table, not the pipeline. It uses Lookup to read config, ForEach to iterate, and Copy to move data dynamically.

Q: Why use a stored procedure for audit logging instead of a direct INSERT? A: Encapsulation (SP hides table structure), reusability (multiple pipelines call the same SP), security (pipeline needs EXECUTE permission only), and maintainability (add columns to audit table by updating SP, not every pipeline).

Q: What is the difference between @item() and @dataset() in ADF expressions? A: @item() references the current ForEach iteration element and is used in pipeline activities. @dataset() references a dataset’s own parameters and is used inside the dataset’s Connection tab. The pipeline passes values via @item(), and the dataset receives them via @dataset().

Q: How do you handle both success and failure logging in a ForEach? A: Add two Stored Procedure activities after the Copy activity. Connect one with a green arrow (Succeeded) for success logging and another with a red arrow (Failed) for failure logging. Only one path executes per Copy run.

Q: What makes this pipeline SCD Type 1? A: Each run overwrites the destination with the latest source data. No history is preserved. The destination always reflects the current state. This is appropriate for staging tables, reference data corrections, and scenarios where only the current state matters.

Wrapping Up

This pipeline — Lookup, ForEach, Copy, Log Success, Log Failure — is the workhorse of Azure data engineering. It is the first pipeline every data engineer builds and the pattern that scales to hundreds of tables.

The beauty of metadata-driven design is that the pipeline is FINISHED. You never touch it again. When the business says “add 10 more tables,” you insert 10 rows into the metadata table. When they say “stop loading ProductCategory,” you delete one row. The pipeline adapts without code changes.

That is the power of SCD Type 1 with metadata-driven orchestration. Simple, reliable, and production-ready.

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Building an SCD Type 1 Pipeline in Azure: Full Load with Metadata, Audit Logging, and Every Activity Explained

Table of Contents

What Makes This SCD Type 1

The Complete Architecture

The Analogy: A Restaurant Kitchen

Prerequisites: What You Need Before Building

The Metadata Table: The Recipe Book

The Audit Table: The Kitchen Log

The Stored Procedure: The Log Writer

Activity 1: Lookup (The Waiter Taking the Order)

What It Does

How to Configure

What It Outputs

Why This Activity Exists

Common Mistake: First Row Only

Activity 2: ForEach (The Kitchen Line)

What It Does

How to Configure

What Happens During Execution

Why This Activity Exists

Sequential vs Parallel

The Items Expression Explained

Activity 3: Copy Data (The Chef Cooking)

What It Does

How to Configure

What Happens During Execution (For Customer Table)

The Parameter Flow (The Most Important Concept)

Copy Activity Output (Used by Logging)

Activity 4: Log Success (The Quality Check — Passed)

What It Does

How to Configure

Why the Green Arrow Matters

What Gets Logged

Activity 5: Log Failure (The Quality Check — Failed)

What It Does

How to Configure

Why Both Success and Failure Handlers

The Three Datasets Explained

Dataset 1: DS_SqlDB_Metadata (The Menu)

Dataset 2: DS_SqlDB_SourceTable (The Ingredient Shelf)

Dataset 3: DS_ADLS_Sink_Parquet (The Serving Plate)

The Linked Services: The Supply Chains

LS_AzureSqlDB (SQL Database Connection)

Default Storage (ADLS Gen2 Connection)

Adding Date-Partitioned Output Folders

Running and Verifying the Pipeline

Debug Run

Expected Output

Verify in ADLS Gen2

Verify Audit Table

What Happens on the Second Run (The SCD Type 1 Behavior)

Evolving to SCD Type 2

Common Errors and Fixes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply