Building a Metadata-Driven Pipeline in Azure Synapse Analytics with Audit Logging

In my previous post, we built a metadata-driven pipeline in Azure Data Factory that copies multiple tables from Azure SQL Database to ADLS Gen2 using the Lookup → ForEach → Copy pattern.

That pipeline works great, but it has a blind spot: you don’t know what happened after it runs. Did every table copy successfully? How many rows were copied? How long did each table take? If something failed, which table failed and why?

In this post, we’ll take that same pattern and level it up in Azure Synapse Analytics with three key improvements:

Parquet output instead of CSV — smaller files, faster queries, embedded schema
Stored procedure audit logging — every copy operation logs its metrics (rows read, rows copied, duration) to an audit table
Success and failure tracking — the pipeline logs both successful copies and failures separately, so you always know exactly what happened

This is the pattern used in production data platforms, and it’s exactly the kind of thing interviewers ask about when they say “how do you handle error logging in your pipelines?”

Why Synapse Instead of ADF?
Architecture Overview
Azure Resources and Prerequisites
Step 1: Set Up the Database (Metadata + Audit + Stored Procedure)
Step 2: Configure Synapse Workspace Permissions
Step 3: Create Linked Services
Step 4: Create Datasets (Including Parquet Sink)
Step 5: Build the Pipeline — Lookup and ForEach
Step 6: Add the Copy Activity
Step 7: Add Audit Logging — Success Path
Step 8: Add Audit Logging — Failure Path
Step 9: The Complete Pipeline Structure
Step 10: Run, Debug, and Verify
Testing the Failure Path
Understanding the Audit Expressions
Why Parquet Instead of CSV?
ADF vs Synapse Pipelines — Key Differences
The @activity() Output Properties Reference
Production Enhancements
Interview Questions
Wrapping Up

Why Synapse Instead of ADF?

Azure Synapse Analytics includes the same pipeline engine as Azure Data Factory. The pipeline authoring experience is nearly identical — same activities, same expressions, same UI. So why use Synapse?

Use Synapse when: – You’re already in the Synapse workspace for SQL pools, Spark pools, or data exploration – You want everything (ingestion, transformation, analysis) in one workspace – You need Synapse-specific features like data flows with Spark – Your organization has standardized on Synapse as the analytics platform

Use ADF when: – You only need data movement (ETL/ELT) without analytics – You want a standalone, lightweight service – Your pipelines don’t need Spark or SQL pool integration

For this tutorial: The pipeline concepts are identical. If you prefer ADF, everything in this post works there too — just swap “Synapse Studio” for “ADF Studio” and “Integrate tab” for “Author tab.”

Architecture Overview

Here’s what we’re building — the same Lookup → ForEach → Copy pattern, but with stored procedure activities for audit logging:

Pipeline: PL_Copy_SqlToADLS_Parquet_WithAudit
│
├── Lookup Activity: "Lookup_Metadata"
│   └── Reads: dbo.metadata table
│   └── Output: Array of table configurations
│
└── ForEach Activity: "ForEach_Table"
    └── Iterates over each metadata row
    │
    ├── Copy Activity: "Copy_TableData"
    │   ├── Source: Azure SQL DB (dynamic table)
    │   └── Sink: ADLS Gen2 Parquet (dynamic folder)
    │
    ├── ──(On Success)──> Stored Procedure: "Log_Success"
    │   └── Inserts: rows_read, rows_copied, copy_duration, table_name
    │
    └── ──(On Failure)──> Stored Procedure: "Log_Failure"
        └── Inserts: error_message, table_name, 0 rows

The key difference from the basic pipeline is the branching after Copy: a green arrow (success) goes to Log_Success, and a red arrow (failure) goes to Log_Failure. Both call the same stored procedure but with different parameters.

Azure Resources and Prerequisites

Resource	Details
Synapse Workspace	Your workspace (e.g., `naveen-synapse-ws`)
Azure SQL Database	AdventureWorksLT with metadata, audit tables, and stored procedure
ADLS Gen2 (Primary Storage)	Default workspace storage with a container (e.g., `synapse-workspace`)
Linked Services	`LS_AzureSqlDB` (SQL) + default workspace storage (ADLS)
Output Format	Parquet (columnar, compressed, optimized for analytics)

Important prerequisite: Your Synapse workspace’s managed identity needs Storage Blob Data Contributor role on the ADLS Gen2 storage account. Without this, the Copy activity will fail with a permissions error.

To set this up: 1. Go to your Storage Account → Access Control (IAM) → + Add role assignment 2. Search for Storage Blob Data Contributor → select it → Next 3. Under Members, click + Select members 4. Search for your Synapse workspace name (e.g., naveen-synapse-ws) — it appears as a managed identity 5. Select it → Review + assign

Step 1: Set Up the Database (Metadata + Audit + Stored Procedure)

Connect to your Azure SQL Database and run these scripts. If you already have the metadata table from the ADF tutorial, you only need the audit table and stored procedure.

Metadata Table (skip if already exists)

CREATE TABLE metadata (
    id            INT IDENTITY(100, 1),
    TableName     VARCHAR(100),
    SchemaName    VARCHAR(100),
    ContainerName VARCHAR(100),
    FolderName    VARCHAR(100)
);

INSERT INTO metadata (TableName, SchemaName, ContainerName, FolderName)
VALUES
    ('Customer',        'SalesLT', 'synapse-workspace', 'sqldb/Customer'),
    ('Address',         'SalesLT', 'synapse-workspace', 'sqldb/Address'),
    ('CustomerAddress', 'SalesLT', 'synapse-workspace', 'sqldb/CustomerAddress'),
    ('Product',         'SalesLT', 'synapse-workspace', 'sqldb/Product'),
    ('ProductCategory', 'SalesLT', 'synapse-workspace', 'sqldb/ProductCategory');

Note: The ContainerName here is synapse-workspace (the default Synapse container), not database like in the ADF tutorial. Adjust this to match your actual container name.

Audit Table

CREATE TABLE audit (
    Id             INT IDENTITY(1, 1),
    rows_read      INT,
    rows_copied    INT,
    copy_duration  INT,
    table_name     VARCHAR(100),
    schema_name    VARCHAR(100),
    error_message  VARCHAR(500),
    load_date      DATETIME
);

Stored Procedure for Audit Logging

CREATE PROCEDURE sp_insert_audit_log
    @rows_read      INT,
    @rows_copied    INT,
    @copy_duration  INT,
    @table_name     VARCHAR(100),
    @schema_name    VARCHAR(100),
    @error_message  VARCHAR(500)
AS
BEGIN
    INSERT INTO audit (rows_read, rows_copied, copy_duration,
                       table_name, schema_name, error_message, load_date)
    VALUES (@rows_read, @rows_copied, @copy_duration,
            @table_name, @schema_name, @error_message, GETDATE());
END;

Why a stored procedure instead of a direct INSERT? Stored procedures encapsulate the logging logic in the database. If you later want to add fields (like pipeline name or run ID), you only change the stored procedure — not every pipeline that calls it. It’s also easier to call from a Stored Procedure activity than writing dynamic SQL in a Script activity.

Verify Everything

SELECT * FROM metadata;        -- Should show 5 rows
SELECT * FROM audit;           -- Should be empty (no runs yet)
EXEC sp_insert_audit_log 100, 100, 5, 'TestTable', 'dbo', 'Test';
SELECT * FROM audit;           -- Should show 1 test row
DELETE FROM audit;             -- Clean up the test row

Step 2: Configure Synapse Workspace Permissions

If you haven’t already, grant the Synapse managed identity access to your storage:

Azure Portal → your Storage Account → Access Control (IAM)
+ Add role assignment → Storage Blob Data Contributor
Members → + Select members → search for your workspace name → select → Review + assign

This step is critical. Without it, your Copy activity will fail with an authentication error when trying to write Parquet files to ADLS Gen2.

Step 3: Create Linked Services

In Synapse Studio, go to the Manage tab (toolbox icon) → Linked services.

A. Linked Service for Azure SQL Database

Click + New → search Azure SQL Database → Continue
Name: LS_AzureSqlDB
Select your server and database
Authentication: SQL authentication (enter credentials)
Test connection → Create

B. Linked Service for ADLS Gen2

You don’t need to create this — Synapse automatically creates a default linked service for your primary ADLS Gen2 storage when the workspace is provisioned. It’s named something like naveen-synapse-ws-WorkspaceDefaultStorage.

So you only need one manually created linked service (the SQL one).

Step 4: Create Datasets (Including Parquet Sink)

Go to the Data tab → create these four datasets:

A. DS_SqlDB_Metadata (Lookup Source)

+ → Integration dataset → Azure SQL Database
Name: DS_SqlDB_Metadata
Linked service: LS_AzureSqlDB
Table: dbo.metadata

B. DS_SqlDB_SourceTable (Parameterized Copy Source)

+ → Integration dataset → Azure SQL Database
Name: DS_SqlDB_SourceTable
Linked service: LS_AzureSqlDB
Leave table blank → OK
Parameters tab → add:
SchemaName (String, no default)
TableName (String, no default)
Connection tab → check “Enter manually”:
Schema → dynamic content → @dataset().SchemaName
Table → dynamic content → @dataset().TableName

Synapse note: In Synapse Studio, the checkbox is called “Enter manually” instead of “Edit” like in ADF. The functionality is identical.

C. DS_ADLS_Sink_Parquet (Parameterized Parquet Sink) — NEW

This is different from the ADF tutorial — we’re using Parquet format instead of CSV.

+ → Integration dataset → Azure Data Lake Storage Gen2 → select Parquet → Continue
Name: DS_ADLS_Sink_Parquet
Linked service: naveen-synapse-ws-WorkspaceDefaultStorage (the default one)
Leave file path blank
Parameters tab → add:
ContainerName (String, no default)
FolderName (String, no default)
Connection tab → file path:
Container → dynamic content → @dataset().ContainerName
Directory → dynamic content → @dataset().FolderName
File name → leave blank (auto-generated)

D. DS_SqlDB_Audit (For Stored Procedure) — NEW

+ → Integration dataset → Azure SQL Database
Name: DS_SqlDB_Audit
Linked service: LS_AzureSqlDB
Table: dbo.audit

Why do we need this dataset? The Stored Procedure activity in Synapse requires a linked service connection. While we’re not reading from the audit table directly, we need this dataset to tell the Stored Procedure activity which database connection to use.

Step 5: Build the Pipeline — Lookup and ForEach

Go to the Integrate tab → + → Pipeline → name it: PL_Copy_SqlToADLS_Parquet_WithAudit

Add Lookup Activity

Drag a Lookup activity onto the canvas
Name: Lookup_Metadata
Settings tab:
Source dataset: DS_SqlDB_Metadata
Uncheck “First row only”
Use query: Query:

SELECT TableName, SchemaName, ContainerName, FolderName
FROM dbo.metadata

Using a SELECT query instead of “Table” gives you control over which columns are returned and lets you add WHERE clauses later (e.g., WHERE IsActive = 1).

Add ForEach Activity

Drag a ForEach activity onto the canvas
Name: ForEach_Table
Draw the green success arrow from Lookup_Metadata → ForEach_Table
Settings tab:
Sequential: unchecked (parallel execution)
Batch count: 5 (all 5 tables run simultaneously)
Items → dynamic content:

@activity('Lookup_Metadata').output.value

Step 6: Add the Copy Activity

Click the pencil icon on ForEach_Table to open the inner canvas
Drag a Copy data activity inside
Name: Copy_TableData

Source Tab

Dataset: DS_SqlDB_SourceTable
SchemaName parameter → dynamic content → @item().SchemaName
TableName parameter → dynamic content → @item().TableName

Sink Tab

Dataset: DS_ADLS_Sink_Parquet
ContainerName parameter → dynamic content → @item().ContainerName
FolderName parameter → dynamic content → @item().FolderName

At this point, you have the same functionality as the ADF pipeline, but outputting Parquet instead of CSV. The magic comes next with audit logging.

Step 7: Add Audit Logging — Success Path

This stored procedure activity runs only when Copy_TableData succeeds, capturing the actual copy metrics.

Drag a Stored Procedure activity onto the inner canvas (inside ForEach)
Name: Log_Success
Draw the green arrow (Success) from Copy_TableData → Log_Success
Settings tab:
Linked service: LS_AzureSqlDB
Stored procedure name: sp_insert_audit_log
Click Import parameter (or add manually)

Configure each parameter with dynamic content:

Parameter	Type	Value (Dynamic Content)
rows_read	Int32	`@activity('Copy_TableData').output.rowsRead`
rows_copied	Int32	`@activity('Copy_TableData').output.rowsCopied`
copy_duration	Int32	`@activity('Copy_TableData').output.copyDuration`
table_name	String	`@item().TableName`
schema_name	String	`@item().SchemaName`
error_message	String	`NA`

What These Expressions Mean

@activity('Copy_TableData').output.rowsRead — The number of rows read from the source table. This comes from the Copy activity’s built-in output properties.
@activity('Copy_TableData').output.rowsCopied — The number of rows successfully written to the sink. Should match rowsRead for a clean copy.
@activity('Copy_TableData').output.copyDuration — How many seconds the copy took. Useful for identifying slow tables.
@item().TableName — The current table name from the ForEach iteration.
NA — A static string indicating no error occurred.

Step 8: Add Audit Logging — Failure Path

This stored procedure activity runs only when Copy_TableData fails, capturing the error details.

Drag another Stored Procedure activity onto the inner canvas
Name: Log_Failure
Create a red arrow (Failure) from Copy_TableData → Log_Failure

How to create a failure (red) arrow: – Hover over Copy_TableData until you see the small connection icons – Drag from the icon to Log_Failure – Click on the arrow that was just created – Change the dependency condition from Succeeded to Failed – The arrow turns red

Settings tab:
Linked service: LS_AzureSqlDB
Stored procedure name: sp_insert_audit_log
Parameters:

Parameter	Type	Value (Dynamic Content)
rows_read	Int32	`0`
rows_copied	Int32	`0`
copy_duration	Int32	`0`
table_name	String	`@item().TableName`
schema_name	String	`@item().SchemaName`
error_message	String	`@concat('Copy failed for ', item().SchemaName, '.', item().TableName)`

Important expression note: Inside a @concat() function, you don’t use the @ prefix for nested functions. It’s item().SchemaName, not @item().SchemaName. The @ is only used once at the beginning of the entire expression.

Alternative: Capturing the Actual Error Message

If you want to capture the actual error message from the failed Copy activity, use:

@activity('Copy_TableData').output.errors[0].Message

However, this path may not always be available depending on the error type. If you get a runtime error on this expression during a failure test, fall back to the simpler concat version above. The Monitor tab always shows the full error details regardless.

Step 9: The Complete Pipeline Structure

Go back to the main canvas by clicking the pipeline name in the breadcrumb. Your pipeline should look like this:

Lookup_Metadata
      │ (Success - green arrow)
      ▼
ForEach_Table
      │
      ├── Copy_TableData
      │       │
      │       ├── (Success - green arrow) → Log_Success
      │       │
      │       └── (Failure - red arrow)   → Log_Failure

Inside the ForEach, three activities are visible: Copy_TableData in the center, Log_Success connected by a green arrow, and Log_Failure connected by a red arrow.

Step 10: Run, Debug, and Verify

Validate and Run

Click Validate at the top — fix any errors
Click Publish all to save to the Synapse service
Click Debug to run

Monitor the Run

In the output pane at the bottom:

Lookup_Metadata    → Succeeded (5 rows)
ForEach_Table      → Succeeded
  ├── Copy_TableData (Customer)        → Succeeded (847 rows)
  ├── Log_Success (Customer)           → Succeeded
  ├── Copy_TableData (Address)         → Succeeded (450 rows)
  ├── Log_Success (Address)            → Succeeded
  ├── Copy_TableData (CustomerAddress) → Succeeded (417 rows)
  ├── Log_Success (CustomerAddress)    → Succeeded
  ├── Copy_TableData (Product)         → Succeeded (295 rows)
  ├── Log_Success (Product)            → Succeeded
  ├── Copy_TableData (ProductCategory) → Succeeded (41 rows)
  └── Log_Success (ProductCategory)    → Succeeded

Notice that Log_Failure doesn’t appear — it only runs when Copy fails, and all copies succeeded.

Verify the Audit Table

SELECT * FROM audit ORDER BY load_date DESC;

You should see 5 rows, one per table:

Id	rows_read	rows_copied	copy_duration	table_name	schema_name	error_message	load_date
1	847	847	3	Customer	SalesLT	NA	2026-04-05 10:30:15
2	450	450	2	Address	SalesLT	NA	2026-04-05 10:30:14
3	417	417	2	CustomerAddress	SalesLT	NA	2026-04-05 10:30:16
4	295	295	2	Product	SalesLT	NA	2026-04-05 10:30:13
5	41	41	1	ProductCategory	SalesLT	NA	2026-04-05 10:30:12

Verify the Parquet Files in ADLS Gen2

Navigate to your ADLS Gen2 storage:

synapse-workspace/
└── sqldb/
    ├── Customer/
    │   └── part-00000-xxxx.snappy.parquet
    ├── Address/
    │   └── part-00000-xxxx.snappy.parquet
    ├── CustomerAddress/
    │   └── part-00000-xxxx.snappy.parquet
    ├── Product/
    │   └── part-00000-xxxx.snappy.parquet
    └── ProductCategory/
        └── part-00000-xxxx.snappy.parquet

Notice the .snappy.parquet extension — Snappy is the default compression used by ADF/Synapse for Parquet files. It provides a good balance between compression ratio and read speed.

Testing the Failure Path

To verify that failure logging works correctly, insert a fake table entry:

INSERT INTO metadata (TableName, SchemaName, ContainerName, FolderName)
VALUES ('FakeTable', 'SalesLT', 'synapse-workspace', 'sqldb/FakeTable');

Run the pipeline again. The 5 real tables will succeed, and FakeTable will fail because it doesn’t exist in the database. Check the audit table:

SELECT * FROM audit ORDER BY load_date DESC;

You should see a new row for FakeTable with rows_read = 0, rows_copied = 0, and an error message.

Clean up after testing:

DELETE FROM metadata WHERE TableName = 'FakeTable';
DELETE FROM audit WHERE table_name = 'FakeTable';

Understanding the Audit Expressions

Here’s a complete reference of every dynamic expression used in the audit logging:

Success Path Expressions

Expression	Returns	Example Value
`@activity('Copy_TableData').output.rowsRead`	Rows read from source	`847`
`@activity('Copy_TableData').output.rowsCopied`	Rows written to sink	`847`
`@activity('Copy_TableData').output.copyDuration`	Duration in seconds	`3`
`@activity('Copy_TableData').output.throughput`	KB/s throughput	`1024.5`
`@activity('Copy_TableData').output.dataRead`	Bytes read from source	`125000`
`@activity('Copy_TableData').output.dataWritten`	Bytes written to sink	`42000`
`@item().TableName`	Current table name	`"Customer"`
`@item().SchemaName`	Current schema name	`"SalesLT"`

Failure Path Expressions

Expression	Returns	Example Value
`@activity('Copy_TableData').output.errors[0].Message`	Error message	`"Table 'SalesLT.FakeTable' not found"`
`@concat('Copy failed for ', item().SchemaName, '.', item().TableName)`	Custom error string	`"Copy failed for SalesLT.FakeTable"`

Common Expression Mistake

Inside @concat(), don’t use @ on nested functions:

-- WRONG (nested @ causes parse error)
@concat('Failed: ', @item().TableName)

-- CORRECT (single @ at the start only)
@concat('Failed: ', item().TableName)

This is a subtle but important rule in ADF/Synapse expressions. The @ prefix is only used once at the beginning of the expression. Everything inside is already in expression context.

Why Parquet Instead of CSV?

We switched from CSV (in the ADF tutorial) to Parquet here. Here’s why:

Feature	CSV	Parquet
File size	Large (no compression by default)	70-90% smaller (columnar + Snappy compression)
Schema	No schema — just text	Schema embedded in the file (column names + data types)
Query performance	Must read entire file	Column pruning — reads only the columns you query
Data types	Everything is a string	Native integers, decimals, dates, timestamps
Compatibility	Universal (any tool can read it)	Supported by Spark, Synapse, Databricks, Athena, BigQuery
Best for	Simple exports, debugging, small datasets	Analytics, data lakes, production pipelines

Rule of thumb: Use CSV for debugging and human-readable exports. Use Parquet for everything else — especially in data lakes and analytics pipelines.

ADF vs Synapse Pipelines — Key Differences

Since we built similar pipelines in both services, here’s a practical comparison:

Aspect	Azure Data Factory	Synapse Pipelines
Where to find it	Standalone service	Inside Synapse workspace
Author tab	“Author” tab	“Integrate” tab
Default ADLS linked service	Must create manually	Auto-created with workspace
“Edit” checkbox on datasets	Called “Edit”	Called “Enter manually”
Spark integration	Requires separate Databricks	Built-in Spark pools
SQL pool integration	Not available	Native SQL pool support
Git integration	Azure DevOps or GitHub	Azure DevOps or GitHub
Pricing	Per activity run + data movement	Per activity run + data movement (same pricing model)
Expression language	Identical	Identical
Copy activity features	Identical	Identical

Bottom line: If you know ADF, you know Synapse Pipelines. The expression language, activity types, and pipeline patterns are the same. The only differences are in the UI navigation and Synapse-specific integrations.

The @activity() Output Properties Reference

The Copy activity exposes several useful output properties that you can reference in downstream activities:

@activity('Copy_TableData').output.rowsRead        -- Rows read from source
@activity('Copy_TableData').output.rowsCopied       -- Rows written to sink
@activity('Copy_TableData').output.copyDuration      -- Duration in seconds
@activity('Copy_TableData').output.throughput        -- Throughput in KB/s
@activity('Copy_TableData').output.dataRead          -- Bytes read
@activity('Copy_TableData').output.dataWritten       -- Bytes written
@activity('Copy_TableData').output.effectiveIntegrationRuntime  -- IR name
@activity('Copy_TableData').output.usedDataIntegrationUnits     -- DIUs used
@activity('Copy_TableData').output.errors[0].Message            -- Error message (on failure)

These properties are available immediately after the Copy activity completes. You can use them in any downstream activity — Stored Procedure, Set Variable, Web Activity, etc.

Production Enhancements

Here are improvements you should consider for a production deployment:

1. Add Pipeline Run ID to Audit Logs

Track which pipeline run generated each audit record:

ALTER TABLE audit ADD pipeline_run_id VARCHAR(100);

Pass it in the stored procedure: @pipeline().RunId

2. Add Date Partitioning to Output Paths

Instead of writing to sqldb/Customer/, partition by date:

@concat(item().FolderName, '/', formatDateTime(utcnow(), 'yyyy'),
        '/', formatDateTime(utcnow(), 'MM'),
        '/', formatDateTime(utcnow(), 'dd'))

This creates: sqldb/Customer/2026/04/05/part-00000.snappy.parquet

3. Enable “Continue on Error” in ForEach

By default, if one table fails, the entire ForEach stops. To let other tables continue:

Click on ForEach_Table → Settings
Enable “Continue on error” (or set it in the JSON)

Combined with audit logging, this ensures maximum data is copied even when individual tables have issues.

4. Add Email Notifications

Use a Web Activity after the ForEach to call a Logic App or Azure Function that sends email notifications with the pipeline status. Include a summary of rows copied and any failures.

5. Schedule with Triggers

Add a Schedule trigger to run the pipeline automatically:

Click Add trigger → New/Edit → New
Set the schedule (e.g., daily at 2 AM UTC)
Publish

Interview Questions

Q: How do you implement audit logging in Azure pipelines? A: Use Stored Procedure activities connected to the Copy activity with Success and Failure dependency conditions. The success path captures output.rowsRead, output.rowsCopied, and output.copyDuration. The failure path captures the error message. Both call the same stored procedure with different parameters.

Q: What’s the difference between green, red, and blue arrows in ADF/Synapse? A: Green = Succeeded (runs only on success). Red = Failed (runs only on failure). Blue = Completed (always runs regardless of outcome). Gray = Skipped (runs only when the parent was skipped). These are dependency conditions that control the execution flow.

Q: Why use Parquet instead of CSV in a data lake? A: Parquet is 70-90% smaller due to columnar compression, has embedded schema (no type guessing), supports column pruning for faster queries, and is the standard format for Spark, Synapse SQL, and Databricks. CSV is only better for human-readable debugging.

Q: How do you handle the @activity().output.errors[0].Message expression when it’s not available? A: Use a @concat() fallback that constructs the error message from known values like item().SchemaName and item().TableName. The Monitor tab always shows the full error details regardless.

Q: Can you nest ForEach loops in ADF? A: No. ADF does not support ForEach inside another ForEach. Use an Execute Pipeline activity to call a child pipeline for the inner loop.

Wrapping Up

You’ve now built a production-quality, metadata-driven pipeline in Azure Synapse Analytics with:

Dynamic table copying — controlled by a metadata table, not hardcoded pipeline configuration
Parquet output — optimized for analytics with 70-90% compression
Audit logging — every copy operation tracked with rows read, rows copied, duration, and error messages
Failure handling — failed tables are logged without stopping the entire pipeline

This is exactly how enterprise data platforms work. The combination of metadata-driven design and comprehensive audit logging gives you both flexibility and observability — the two things every data engineer needs in production.

Related posts: – Building a Metadata-Driven Pipeline in Azure Data Factory – Building a REST API with Python FastAPI on AWS Lambda – Incremental Data Loading with Delta Copy Pattern (coming soon) – Top 15 Azure Data Factory Interview Questions (coming soon)

If this guide helped you, share it with a fellow data engineer. Have questions? Drop a comment below.

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.