Building a Metadata-Driven Pipeline in Azure Data Factory: The Complete Guide
If you’re working in data engineering, you’ll build dozens of pipelines that copy data from one place to another. The naive approach is creating a separate pipeline for each table — but that quickly becomes a maintenance nightmare when you have 50+ tables.
The smarter approach is a metadata-driven pipeline — a single, reusable pipeline that reads its configuration from a database table and dynamically copies any number of tables. Change the metadata, change what gets copied. No pipeline code changes needed.
In this guide, I’ll walk you through building a production-quality metadata-driven pipeline in Azure Data Factory (ADF) using the Lookup → ForEach → Copy pattern. This is the same pattern used in enterprise data platforms, and it’s one of the most commonly asked topics in data engineering interviews.
This tutorial is based on a real project I built copying SalesLT tables from Azure SQL Database to Azure Data Lake Storage Gen2. I’ll share every step, every gotcha, and every real error I hit along the way.
Table of Contents
- What is a Metadata-Driven Pipeline?
- Architecture Overview
- Azure Resources You’ll Need
- Step 1: Create the Metadata Table
- Step 2: Create Linked Services
- Step 3: Create Datasets (The Tricky Part)
- Step 4: Build the Pipeline
- Step 5: Configure the Lookup Activity
- Step 6: Configure the ForEach Activity
- Step 7: Configure the Copy Activity Inside ForEach
- Step 8: Run and Debug the Pipeline
- Step 9: Verify the Output in ADLS Gen2
- Understanding the Dynamic Expressions
- The BadRequest Null Error (And How to Fix It)
- Common Mistakes and How to Avoid Them
- Production Best Practices
- Interview Questions About This Pattern
- What’s Next: Adding Audit Logging
- Wrapping Up
What is a Metadata-Driven Pipeline?
A metadata-driven pipeline is a design pattern where the pipeline’s behavior — which tables to copy, where to copy them, what format to use — is controlled by configuration data stored in a database table, not hardcoded in the pipeline itself.
Instead of building this:
Pipeline_Copy_Customer → copies Customer table
Pipeline_Copy_Address → copies Address table
Pipeline_Copy_Product → copies Product table
Pipeline_Copy_Orders → copies Orders table
... (50 more pipelines)
You build this:
Pipeline_Copy_All_Tables → reads metadata table
→ loops through each row
→ copies each table dynamically
Why this matters:
- One pipeline handles everything. Adding a new table means inserting a row in the metadata table — not building a new pipeline.
- Easy to maintain. Change the destination folder? Update the metadata. Disable a table? Delete or flag the row.
- Consistent behavior. Every table goes through the same copy logic, error handling, and logging.
- Interview favorite. This is asked in almost every Azure Data Engineering interview. Understanding it deeply gives you a significant advantage.
Architecture Overview
Here’s the high-level flow of what we’re building:
Pipeline: PL_Copy_SqlToADLS
│
├── Lookup Activity: "Lookup_Metadata"
│ └── Reads from: dbo.metadata table
│ └── Output: Array of table configurations
│
└── ForEach Activity: "ForEach_Table"
└── Iterates over: Each row from Lookup output
│
└── Copy Activity: "Copy_TableData" (inside ForEach)
├── Source: Azure SQL Database (dynamic schema + table)
└── Sink: ADLS Gen2 (dynamic container + folder path)
Data flow:
- The Lookup activity reads the metadata table and returns an array of objects — one for each table to copy
- The ForEach activity iterates over this array, executing the inner activities once per table
- The Copy activity inside ForEach dynamically resolves the source table and destination path using
@item()expressions that reference the current metadata row
Azure Resources You’ll Need
Before we start building, make sure you have these resources provisioned:
| Resource | Details |
|---|---|
| Azure Data Factory | Your ADF workspace (e.g., naveen-datafactory-de) |
| Azure SQL Database | AdventureWorksLT sample database (or any SQL DB with multiple tables) |
| Azure Data Lake Storage Gen2 | Storage account with hierarchical namespace enabled |
| ADLS Container | A container created manually (e.g., database) — ADF cannot create containers, only folders |
Important: The ADLS Gen2 container must exist before running the pipeline. ADF will auto-create folders inside the container, but it cannot create the container itself. If the container doesn’t exist, you’ll get a runtime error.
Step 1: Create the Metadata Table
The metadata table is the brain of your pipeline. Each row represents one table to be copied. Connect to your Azure SQL Database and run:
-- Create the metadata table
CREATE TABLE metadata (
id INT IDENTITY(100, 1),
TableName VARCHAR(100),
SchemaName VARCHAR(100),
ContainerName VARCHAR(100),
FolderName VARCHAR(100)
);
-- Insert table configurations
INSERT INTO metadata (TableName, SchemaName, ContainerName, FolderName)
VALUES
('Customer', 'SalesLT', 'database', 'sqldb/Customer'),
('Address', 'SalesLT', 'database', 'sqldb/Address'),
('CustomerAddress', 'SalesLT', 'database', 'sqldb/CustomerAddress'),
('Product', 'SalesLT', 'database', 'sqldb/Product'),
('ProductCategory', 'SalesLT', 'database', 'sqldb/ProductCategory');
-- Verify
SELECT * FROM metadata;
Output:
| id | TableName | SchemaName | ContainerName | FolderName |
|---|---|---|---|---|
| 100 | Customer | SalesLT | database | sqldb/Customer |
| 101 | Address | SalesLT | database | sqldb/Address |
| 102 | CustomerAddress | SalesLT | database | sqldb/CustomerAddress |
| 103 | Product | SalesLT | database | sqldb/Product |
| 104 | ProductCategory | SalesLT | database | sqldb/ProductCategory |
Understanding the Column Design
Each column serves a specific purpose in the pipeline:
- TableName — The table to copy from the source database. Used in the Copy activity’s source query.
- SchemaName — The schema prefix (e.g.,
SalesLT). Combined with TableName to form the fully qualified table nameSalesLT.Customer. - ContainerName — The ADLS Gen2 container where data will land. Must already exist.
- FolderName — The folder path inside the container. ADF creates this automatically. Using
sqldb/Customercreates a nested folder structure.
Pro tip: In production, you’d add more columns like IsActive (to enable/disable tables), LoadType (full/incremental), FileFormat (CSV/Parquet/JSON), and LastLoadDate (for tracking).
Step 2: Create Linked Services
Linked Services are connection strings to your data sources. You need two:
A. Linked Service for Azure SQL Database
- In ADF Studio, go to Manage tab (wrench icon) → Linked services → + New
- Search for Azure SQL Database → select it
- Configure:
- Name:
LS_AzureSqlDB - Server name: select your Azure SQL server from the dropdown
- Database name: select your database
- Authentication: SQL authentication (enter username and password)
- Click Test connection — make sure it shows “Connection successful”
- Click Create
B. Linked Service for ADLS Gen2
- + New → search for Azure Data Lake Storage Gen2 → select it
- Configure:
- Name:
LS_ADLS_Gen2 - Authentication: Account Key (easiest) or Managed Identity (more secure, recommended for production)
- Select your storage account from the dropdown
- Click Test connection → Create
Note: If you’re using Synapse instead of ADF, the default workspace storage linked service (naveen-synapse-ws-WorkspaceDefaultStorage) is already created for you. You only need to create the SQL linked service.
Step 3: Create Datasets (The Tricky Part)
This is where most people make mistakes. You need three datasets, and two of them are parameterized — meaning they don’t point to a specific table or folder. Instead, they accept parameters at runtime that tell them which table to read from and which folder to write to.
Dataset 1: DS_SqlDB_Metadata (Lookup Source)
This is the simplest dataset — it just points to the metadata table.
- Go to Author tab → + → Dataset → Azure SQL Database
- Name:
DS_SqlDB_Metadata - Linked service:
LS_AzureSqlDB - Table: select
dbo.metadata - Click OK
No parameters needed here — this dataset always reads from the same table.
Dataset 2: DS_SqlDB_SourceTable (Parameterized Copy Source)
This dataset needs to point to different tables depending on which iteration of the ForEach loop is running. We achieve this with parameters.
- + → Dataset → Azure SQL Database
- Name:
DS_SqlDB_SourceTable - Linked service:
LS_AzureSqlDB - Leave the table name blank — don’t select any table
- Click OK
- Go to the Parameters tab → add two parameters:
SchemaName(Type: String) — leave default value EMPTYTableName(Type: String) — leave default value EMPTY- Go to the Connection tab:
- Check the “Edit” checkbox (in ADF) or “Enter manually” checkbox (in Synapse)
- Two text boxes appear for Schema and Table
- Schema field → click Add dynamic content → type:
@dataset().SchemaName - Table field → click Add dynamic content → type:
@dataset().TableName
CRITICAL: Leave the parameter default values EMPTY. Do not put any expression, variable reference, or placeholder text in the default value field. This is the #1 cause of the dreaded “BadRequest null” error. The values will be passed by the Copy activity at runtime.
Dataset 3: DS_ADLS_Sink (Parameterized Sink)
This dataset needs to write to different folders depending on the current table being copied.
- + → Dataset → Azure Data Lake Storage Gen2 → select DelimitedText (for CSV) or Parquet (for columnar format)
- Name:
DS_ADLS_Sink - Linked service:
LS_ADLS_Gen2 - Check “First row as header” (for CSV)
- Leave file path blank
- Click OK
- Go to the Parameters tab → add two parameters:
ContainerName(Type: String) — leave default value EMPTYFolderName(Type: String) — leave default value EMPTY- Go to the Connection tab:
- File system (Container) → Add dynamic content →
@dataset().ContainerName - Directory (Folder) → Add dynamic content →
@dataset().FolderName - File name → leave blank (ADF auto-generates file names)
The Critical Rule for Avoiding BadRequest Errors
This deserves its own callout because it’s the most common mistake:
Never put expressions, variable references, or placeholder text in dataset parameter default values.
When you add parameters to a dataset, ADF shows a “Default value” field. Leave it empty. If you put something like @pipeline().parameters.TableName or even just the word “test” there, and later remove or rename that pipeline parameter, ADF will throw the cryptic “BadRequest null” error with no useful error message.
The default value field is only for when you want a fallback — and in a metadata-driven pipeline, you always want the value to come from the ForEach loop, not from a default.
Step 4: Build the Pipeline
Now let’s create the pipeline and connect the activities:
- Go to Author tab → + → Pipeline
- Name it:
PL_Copy_SqlToADLS
Add the activities:
- Drag a Lookup activity onto the canvas → name it:
Lookup_Metadata - Drag a ForEach activity onto the canvas → name it:
ForEach_Table - Connect them: hover over the Lookup activity → drag the green success arrow to the ForEach activity
- Double-click the ForEach activity to open its inner canvas
- Drag a Copy Data activity inside → name it:
Copy_TableData
Your canvas should look like:
[Lookup_Metadata] ──green arrow──> [ForEach_Table]
│
└── Inside: [Copy_TableData]
Step 5: Configure the Lookup Activity
Click on the Lookup_Metadata activity:
- Go to the Settings tab
- Source dataset: select
DS_SqlDB_Metadata - UNCHECK “First row only” — this is critical! If checked, the Lookup returns only one row (a single object). Unchecked, it returns an array of all rows, which is what ForEach needs.
- Query: Use Table (default) — this reads all rows from the metadata table
What the Lookup returns (when “First row only” is unchecked):
{
"count": 5,
"value": [
{
"id": 100,
"TableName": "Customer",
"SchemaName": "SalesLT",
"ContainerName": "database",
"FolderName": "sqldb/Customer"
},
{
"id": 101,
"TableName": "Address",
"SchemaName": "SalesLT",
"ContainerName": "database",
"FolderName": "sqldb/Address"
},
...
]
}
What happens if “First row only” is checked (common mistake):
{
"firstRow": {
"id": 100,
"TableName": "Customer",
"SchemaName": "SalesLT",
...
}
}
The output structure is completely different — firstRow vs value array. This breaks the ForEach activity because it expects an array to iterate over, not a single object.
Step 6: Configure the ForEach Activity
Click on the ForEach_Table activity:
- Go to the Settings tab
- Items field → click Add dynamic content → enter:
@activity('Lookup_Metadata').output.value
This expression does the following:
– @activity('Lookup_Metadata') — references the Lookup activity by name (must match exactly, case-sensitive)
– .output — accesses the activity’s output
– .value — accesses the array of rows (this is why “First row only” must be unchecked)
- Sequential checkbox: Leave unchecked for parallel execution (faster), or check it if you want tables to be copied one at a time (useful for debugging or when resources are limited)
- Batch count: Default is 20 (max parallel copies). For a small number of tables (5-10), leave the default.
Step 7: Configure the Copy Activity Inside ForEach
Double-click the ForEach activity to open its inner canvas. Click on the Copy_TableData activity:
Source Tab
- Source dataset: select
DS_SqlDB_SourceTable - Two parameter fields will appear (SchemaName and TableName)
- Click Add dynamic content for each:
- SchemaName →
@item().SchemaName - TableName →
@item().TableName
What does @item() mean?
Inside a ForEach activity, @item() refers to the current element being processed in the loop. If the ForEach is iterating over the array from the Lookup output, then on the first iteration, @item() is:
{
"id": 100,
"TableName": "Customer",
"SchemaName": "SalesLT",
"ContainerName": "database",
"FolderName": "sqldb/Customer"
}
So @item().SchemaName resolves to "SalesLT" and @item().TableName resolves to "Customer". The Copy activity then reads from SalesLT.Customer.
On the second iteration, @item() becomes the Address row, and the Copy reads from SalesLT.Address. And so on.
Sink Tab
- Sink dataset: select
DS_ADLS_Sink - Two parameter fields will appear (ContainerName and FolderName)
- Click Add dynamic content for each:
- ContainerName →
@item().ContainerName - FolderName →
@item().FolderName
Remember: @dataset() vs @item()
This confuses many people, so let me clarify:
| Expression | Where It’s Used | What It Does |
|---|---|---|
@dataset().SchemaName |
In the Dataset Connection tab | References the dataset’s own parameter — like a function parameter |
@item().SchemaName |
In the Pipeline Copy activity | Passes a value from the current ForEach item to the dataset parameter |
Think of it like a function call:
// Dataset is like a function definition
function copyFromTable(SchemaName, TableName) {
// @dataset().SchemaName is used inside the function body
return readFrom(SchemaName + "." + TableName);
}
// Pipeline is like calling the function
for (item in lookupResults) {
// @item().SchemaName is the argument passed to the function
copyFromTable(item.SchemaName, item.TableName);
}
Step 8: Run and Debug the Pipeline
Before running, do this pre-flight checklist:
- Verify the ADLS container exists — Go to your Storage Account → Containers → confirm
databasecontainer is there - Validate the pipeline — Click the Validate button (checkmark icon) at the top. Fix any errors.
- Publish all — Click Publish all to save your changes to the ADF service
Now click Debug to run the pipeline.
What to watch in the output pane:
- Lookup_Metadata should show: Status: Succeeded, Output: (click the glasses icon to see the JSON array)
- ForEach_Table should show: Status: Succeeded, with 5 iterations
- Click on ForEach_Table → View iterations → each Copy_TableData should show Succeeded with rows read/written
Successful run example:
Lookup_Metadata → Succeeded (5 rows returned)
ForEach_Table → Succeeded
├── Copy_TableData (Customer) → 847 rows read, 847 rows written
├── Copy_TableData (Address) → 450 rows read, 450 rows written
├── Copy_TableData (CustomerAddress) → 417 rows read, 417 rows written
├── Copy_TableData (Product) → 295 rows read, 295 rows written
└── Copy_TableData (ProductCategory) → 41 rows read, 41 rows written
Step 9: Verify the Output in ADLS Gen2
After the pipeline succeeds, check your ADLS Gen2 storage:
- Go to your Storage Account → Containers →
database - Navigate into the
sqldbfolder - You should see five subfolders:
database/
└── sqldb/
├── Customer/
│ └── part-00000-xxxx.csv (847 rows)
├── Address/
│ └── part-00000-xxxx.csv (450 rows)
├── CustomerAddress/
│ └── part-00000-xxxx.csv (417 rows)
├── Product/
│ └── part-00000-xxxx.csv (295 rows)
└── ProductCategory/
└── part-00000-xxxx.csv (41 rows)
The file names are auto-generated by ADF. Each CSV file contains the complete table data with headers in the first row.
Understanding the Dynamic Expressions
Let me consolidate all the dynamic expressions used in this pipeline so you have a complete reference:
Lookup Output Expression
@activity('Lookup_Metadata').output.value
@activity('Lookup_Metadata')— references the Lookup activity (name must match exactly).output— the activity’s full output object.value— the array of rows (only exists when “First row only” is unchecked)
ForEach Item Expressions
@item().TableName → "Customer" (current iteration's table name)
@item().SchemaName → "SalesLT" (current iteration's schema)
@item().ContainerName → "database" (ADLS container name)
@item().FolderName → "sqldb/Customer" (ADLS folder path)
@item()— the current element in the ForEach loop- Property names are case-sensitive —
@item().tablenamewon’t work if the column isTableName
Dataset Parameter Expressions
@dataset().SchemaName → used in dataset Connection tab
@dataset().TableName → used in dataset Connection tab
@dataset().ContainerName → used in dataset Connection tab
@dataset().FolderName → used in dataset Connection tab
@dataset()can only be used inside a dataset’s own configuration- It references the parameters defined on that dataset
- Values are passed by the pipeline activity that uses the dataset
The BadRequest Null Error (And How to Fix It)
If you’ve worked with ADF, you’ve probably seen this error:
{
"code": "BadRequest",
"message": null,
"target": "pipeline//runid/03227a01-...",
"details": null,
"error": null
}
This is the most frustrating error in ADF — the message is literally null, giving you zero information about what went wrong.
The Top 5 Causes (In Order of Likelihood)
1. Stale dataset parameter defaults — You had a default value in a dataset parameter that referenced a pipeline variable you later deleted. Even though you’re not using the default, ADF still tries to resolve it. Fix: Open every dataset, go to Parameters tab, and clear all default values.
2. Trailing whitespace on names — An activity named "Copy_TableData " (with a trailing space) will cause this error, and it passes validation. Fix: Carefully retype activity, dataset, and parameter names.
3. Incorrect expression references — An expression like @activity('LookupMetadata').output.value when the activity is actually named Lookup_Metadata (with underscore). Fix: Copy-paste activity names into expressions to avoid typos.
4. Key Vault linked service UI bug — Editing a Key Vault-based linked service through the UI can inject stray characters. Fix: Edit the linked service JSON directly using the {} code button.
5. Nested ForEach loops — ADF does not support ForEach inside another ForEach. Fix: Use an Execute Pipeline activity to call a child pipeline for the inner loop.
How to Debug When the Message is Null
Use the binary chop method:
- Add a breakpoint after the Lookup activity → Debug → does it succeed?
- If yes, enable ForEach but replace the Copy with a Set Variable activity → Debug → does iteration work?
- If yes, add the Copy back → the problem is in the Copy activity’s dataset parameter mapping
- Check each dataset’s JSON by clicking the
{}code button → look for stale references
Common Mistakes and How to Avoid Them
Based on my experience and common issues I’ve seen:
| Mistake | Impact | Prevention |
|---|---|---|
| Leaving “First row only” checked on Lookup | ForEach gets an object instead of an array — fails silently or copies only one table | Always uncheck for metadata-driven pipelines |
| Default values in dataset parameters | BadRequest null error at runtime | Never put values in dataset parameter defaults |
Case mismatch in @item() properties |
Runtime error — @item().tablename won’t find TableName |
Match the exact column name case from your SQL table |
| Missing ADLS container | Runtime error — ADF cannot create containers | Always create the container manually before running |
| Not connecting Lookup to ForEach | ForEach runs before Lookup completes — empty or stale data | Drag the green success arrow from Lookup to ForEach |
Using @dataset() in the pipeline |
Error: “dataset function is not valid” | Use @dataset() only inside datasets. Use @item() in the pipeline. |
| Not clicking “Publish all” before Debug | Runs with old configuration | Always Publish before Debugging |
Production Best Practices
If you’re taking this pattern to production, consider these enhancements:
1. Add an IsActive Flag
ALTER TABLE metadata ADD IsActive BIT DEFAULT 1;
Then in your Lookup, use a query instead of the table:
SELECT * FROM metadata WHERE IsActive = 1
This lets you disable tables without deleting rows.
2. Add Audit Logging
Create a stored procedure that logs success/failure details:
CREATE PROCEDURE sp_LogAudit
@table_name VARCHAR(100),
@schema_name VARCHAR(100),
@rows_read INT,
@rows_copied INT,
@copy_duration INT,
@error_message VARCHAR(500)
AS
BEGIN
INSERT INTO audit (table_name, schema_name, rows_read, rows_copied,
copy_duration, error_message, load_date)
VALUES (@table_name, @schema_name, @rows_read, @rows_copied,
@copy_duration, @error_message, GETDATE());
END
I cover this in detail in my next post: Building a Synapse Pipeline with Audit Logging.
3. Use Parquet Instead of CSV
Parquet is better for analytics workloads: – 70-90% smaller files (columnar compression) – Faster query performance in Synapse, Databricks, and Spark – Schema is embedded in the file — no guessing data types – Supports nested data structures
Just change your sink dataset from DelimitedText to Parquet format.
4. Add Date-Based Folder Partitioning
Instead of writing all data to sqldb/Customer/, add date partitions:
@concat(item().FolderName, '/', formatDateTime(utcnow(), 'yyyy'), '/',
formatDateTime(utcnow(), 'MM'), '/', formatDateTime(utcnow(), 'dd'))
This creates:
sqldb/Customer/2026/04/05/part-00000.csv
Making it easy to query specific dates and manage data retention.
5. Add Error Handling
In the ForEach activity, you can handle individual table failures without stopping the entire pipeline:
- In the ForEach Settings tab, check “Continue on error”
- This means if the Customer table fails, the pipeline continues with Address, Product, etc.
- Combine with audit logging to track which tables succeeded and which failed
Interview Questions About This Pattern
Here are questions you might face in an Azure Data Engineering interview:
Q: What is a metadata-driven pipeline in ADF? A: It’s a pipeline that reads its configuration (which tables to copy, where to write, what format to use) from a metadata table rather than having these details hardcoded. It uses Lookup to read the config, ForEach to iterate, and Copy to move data dynamically.
Q: Why use @item() instead of @dataset() in the Copy activity?
A: @item() references the current element in the ForEach loop and is used in the pipeline. @dataset() references the dataset’s own parameters and is used only inside the dataset’s configuration (Connection tab). The pipeline passes values to the dataset using @item(), and the dataset uses those values internally via @dataset().
Q: What happens if “First row only” is checked on the Lookup?
A: The Lookup returns a single object (output.firstRow) instead of an array (output.value). The ForEach activity expects an array, so it will fail or behave unexpectedly.
Q: How do you add a new table to the pipeline? A: Insert a new row in the metadata table with the table name, schema, container, and folder path. No pipeline changes needed. The next time the pipeline runs, it picks up the new row automatically.
Q: How do you handle failures for individual tables? A: Enable “Continue on error” on the ForEach activity. Combine with audit logging using Stored Procedure activities that capture error messages for failed tables.
Q: What’s the difference between ADF and Synapse Pipelines? A: They use the same engine and almost identical UI. Synapse Pipelines are integrated into the Synapse workspace and can leverage Synapse-specific features (Spark pools, SQL pools, data flows). ADF is a standalone service. For pure ETL workloads, both work equally well.
What’s Next: Adding Audit Logging
The pipeline we built works great, but in production you need visibility into what happened during each run. In my next post, I’ll extend this pipeline in Azure Synapse Analytics to include:
- Parquet output instead of CSV (smaller, faster, better for analytics)
- Stored procedure audit logging that captures
rows_read,rows_copied,copy_duration, anderror_messagefor every table - Success and failure tracking — the pipeline logs both successful copies and failures separately
- Incremental loading with a watermark pattern using a config table
Stay tuned for: Building a Metadata-Driven Pipeline in Azure Synapse Analytics with Audit Logging
Wrapping Up
You now have a production-quality, metadata-driven pipeline in Azure Data Factory that:
- Dynamically copies any number of tables from Azure SQL Database to ADLS Gen2
- Uses parameterized datasets for maximum reusability
- Can be extended with new tables by simply adding rows to a metadata table
- Follows the Lookup → ForEach → Copy pattern used in enterprise data platforms
This pattern is fundamental to modern data engineering on Azure. Whether you’re building a data lake, a warehouse loading process, or preparing for an interview — mastering this pattern will serve you well.
Related posts: – Building a REST API with Python FastAPI on AWS Lambda – Building a Synapse Pipeline with Audit Logging (coming soon) – Incremental Data Loading with Delta Copy Pattern (coming soon) – Top 15 Azure Data Factory Interview Questions (coming soon)
If this guide helped you, share it with a fellow data engineer. Have questions? Drop a comment below.
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.