Understanding Parameterized Datasets in Azure Data Factory: A Deep Dive
Parameterized datasets are what separate a beginner ADF developer from a production-ready one. Without them, you create one dataset per table — 50 tables means 50 datasets. With them, you create one dataset that handles all 50 tables dynamically.
Think of a parameterized dataset like a form letter template. A regular dataset is a letter with the recipient’s name, address, and greeting already filled in — one letter per person. A parameterized dataset is the template with blank fields: “Dear _____, your order for _____ ships to _____.” At runtime, the pipeline fills in the blanks for each iteration. One template, unlimited letters.
If you have built our metadata-driven pipeline, you have already used parameterized datasets. This post explains the mechanics behind what you built — so you can debug, extend, and build new patterns confidently.
Table of Contents
- Why Parameterize? The Multiplication Problem
- The Three Expression Scopes
- Creating a Parameterized Source Dataset
- Creating a Parameterized Sink Dataset
- How Values Flow at Runtime
- The Golden Rules
- Pipeline Parameters vs Dataset Parameters
- Variables vs Parameters
- ADF vs Synapse UI Differences
- Advanced: Dynamic File Names and Date Partitions
- Real-World Patterns
- Common Errors
- Interview Questions
- Wrapping Up
Why Parameterize? The Multiplication Problem
Without parameterization:
100 datasets (50 source + 50 sink) — one per table
Adding a new table = create 2 datasets + update pipeline
With parameterization:
2 datasets total — one parameterized source, one parameterized sink
Adding a new table = insert 1 row in metadata table
The math makes this obvious at scale:
| Approach | 10 Tables | 50 Tables | 200 Tables | Adding 1 New Table |
|---|---|---|---|---|
| Hardcoded datasets | 20 datasets | 100 datasets | 400 datasets | Create 2 datasets + edit pipeline |
| Parameterized datasets | 2 datasets | 2 datasets | 2 datasets | Insert 1 row in metadata table |
Parameterized datasets are not an optimization — they are the only sane approach for production. Every enterprise ADF pipeline uses them.
The Three Expression Scopes
1. Dataset Scope: @dataset()
Used inside a dataset’s Connection tab. References the dataset’s own parameters.
Where: Dataset Connection tab
Expression: @dataset().SchemaName
2. Pipeline/ForEach Scope: @item()
Used inside a pipeline activity within a ForEach loop.
Where: Pipeline Copy Activity Source/Sink tab
Expression: @item().SchemaName
3. Activity Scope: @activity()
Used inside a pipeline activity to reference another activity’s output.
Where: Pipeline — any activity referencing another
Expression: @activity('Lookup_Metadata').output.value
The Critical Rule
@dataset() is ONLY inside datasets. @item() and @activity() are ONLY inside pipelines. Cross them and you get errors.
WHERE EACH EXPRESSION LIVES:
┌─────────────────────────────────────────────────────────────────┐
│ PIPELINE (ForEach loop) │
│ │
│ @item().SchemaName ← current row from ForEach │
│ @item().TableName ← current row from ForEach │
│ @activity('Lookup').output ← output from another activity │
│ @pipeline().parameters.X ← pipeline-level parameter │
│ │
│ Copy Activity Source tab: │
│ Dataset: DS_SqlDB_SourceTable │
│ SchemaName: @item().SchemaName ← passes value TO dataset │
│ TableName: @item().TableName ← passes value TO dataset │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ DATASET: DS_SqlDB_SourceTable │ │
│ │ │ │
│ │ Parameters: SchemaName, TableName │ │
│ │ │ │
│ │ Connection tab: │ │
│ │ Schema: @dataset().SchemaName ← reads its OWN param │ │
│ │ Table: @dataset().TableName ← reads its OWN param │ │
│ │ │ │
│ │ ❌ @item() does NOT work here │ │
│ │ ❌ @activity() does NOT work here │ │
│ │ ❌ @pipeline() does NOT work here │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ❌ @dataset() does NOT work here (pipeline level) │
└─────────────────────────────────────────────────────────────────┘
Think of it as a function call:
# Dataset = function definition
def read_from_sql(SchemaName, TableName): # @dataset() parameters
return query(f"SELECT * FROM {SchemaName}.{TableName}")
# Pipeline = function call
for row in metadata_rows: # ForEach iteration
data = read_from_sql(
SchemaName=row.SchemaName, # @item().SchemaName
TableName=row.TableName # @item().TableName
)
Creating a Parameterized Source Dataset
Step 1: Create the Dataset
- Author tab > + > Dataset > Azure SQL Database
- Name:
DS_SqlDB_SourceTable - Linked service:
LS_AzureSqlDB - Leave table blank
Step 2: Add Parameters
- Parameters tab > add:
SchemaName(String, empty default)TableName(String, empty default)
CRITICAL: Leave default values empty. This prevents BadRequest null errors.
Step 3: Configure Connection
- Connection tab > check “Edit” (ADF) or “Enter manually” (Synapse)
- Schema > dynamic content >
@dataset().SchemaName - Table > dynamic content >
@dataset().TableName
JSON View
{
"name": "DS_SqlDB_SourceTable",
"properties": {
"parameters": {
"SchemaName": {"type": "string"},
"TableName": {"type": "string"}
},
"type": "AzureSqlTable",
"typeProperties": {
"schema": {
"value": "@dataset().SchemaName",
"type": "Expression"
},
"table": {
"value": "@dataset().TableName",
"type": "Expression"
}
}
}
}
Creating a Parameterized Sink Dataset
- + > Dataset > ADLS Gen2 > Parquet
- Name:
DS_ADLS_Sink - Parameters:
ContainerName(String, empty),FolderName(String, empty) - Connection tab:
- Container >
@dataset().ContainerName - Directory >
@dataset().FolderName - File name > leave blank
How Values Flow at Runtime
Step 1: Lookup reads metadata table
Output: [{SchemaName: "SalesLT", TableName: "Customer", ...}]
Step 2: ForEach iterates — @item() = current row
Step 3: Copy Source passes to dataset:
DS_SqlDB_SourceTable.SchemaName <- @item().SchemaName = "SalesLT"
DS_SqlDB_SourceTable.TableName <- @item().TableName = "Customer"
Step 4: Dataset resolves:
@dataset().SchemaName -> "SalesLT"
@dataset().TableName -> "Customer"
Result: SELECT * FROM SalesLT.Customer
Step 5: Copy Sink passes:
DS_ADLS_Sink.ContainerName <- @item().ContainerName = "database"
DS_ADLS_Sink.FolderName <- @item().FolderName = "sqldb/Customer"
Step 6: Write to database/sqldb/Customer/part-00000.parquet
The Golden Rules
- Never put expressions in dataset parameter defaults. Leave empty. Always.
- Never use @dataset() in the pipeline. Use @item() or @pipeline().parameters.
- Never use @item() in datasets. Use @dataset() in Connection tabs.
- Parameter names are case-sensitive.
TableName!=tablename. - No trailing whitespace on parameter names.
- Always publish before debugging.
Pipeline Parameters vs Dataset Parameters
| Type | Defined On | Passed By | Expression |
|---|---|---|---|
| Dataset parameter | Dataset Parameters tab | Copy activity Source/Sink | @dataset().ParamName |
| Pipeline parameter | Pipeline Parameters tab | Trigger or manual run | @pipeline().parameters.ParamName |
You can chain them:
Trigger > Pipeline parameter > Copy activity > Dataset parameter
Variables vs Parameters
This is another common confusion. Parameters and variables look similar but serve different purposes:
| Feature | Parameter | Variable |
|---|---|---|
| Where defined | Pipeline or Dataset | Pipeline only |
| Can change at runtime? | No — set once when pipeline starts, immutable | Yes — use Set Variable activity to update |
| Scope | Entire pipeline or entire dataset | Entire pipeline (or ForEach scope) |
| Set by | Trigger, parent pipeline, or manual run | Set Variable / Append Variable activity |
| Expression | @pipeline().parameters.X |
@variables('X') |
| Use case | Input values that do not change: environment name, date range, source config | Values that accumulate or change: row counts, error messages, status flags |
Example: Pipeline with both
Parameters (set at trigger time, immutable):
SourceEnvironment = "production"
LoadDate = "2026-06-12"
Variables (change during execution):
TotalRowsCopied = 0 ← updated by Set Variable after each Copy
FailedTables = [] ← appended by Append Variable on failure
PipelineStatus = "Running" ← updated to "Success" or "Failed" at the end
Key rule: You CANNOT use Set Variable inside a ForEach loop to accumulate
across iterations (each iteration gets its own copy of the variable).
Use Append Variable instead, or accumulate in a SQL audit table.
ADF vs Synapse UI Differences
| Action | ADF Studio | Synapse Studio |
|---|---|---|
| Enable dynamic table | Check “Edit” | Check “Enter manually” |
| Dataset location | Author > Datasets | Data > Integration datasets |
| Dynamic content | Click field > “Add dynamic content” | Same, or click fx icon |
Advanced: Dynamic File Names and Date Partitions
Dynamic File Names
@concat(item().TableName, '_', formatDateTime(utcnow(), 'yyyyMMdd'), '.parquet')
Result: Customer_20260405.parquet
Date-Partitioned Folders
@concat(item().FolderName, '/',
formatDateTime(utcnow(), 'yyyy'), '/',
formatDateTime(utcnow(), 'MM'), '/',
formatDateTime(utcnow(), 'dd'))
Result: sqldb/Customer/2026/04/05/
Important: Put this in the pipeline Copy Sink tab, NOT in the dataset. Dataset Directory stays as @dataset().FolderName.
Real-World Patterns
Pattern 1: One Pipeline, Multiple Output Formats
The metadata table controls the output format per table — some tables land as Parquet, others as Delta, others as CSV:
Metadata Table:
| TableName | SinkFormat | ContainerName |
|-----------|-----------|---------------|
| Customer | parquet | raw |
| Orders | delta | curated |
| Config | csv | reference |
Pipeline: ForEach → If Condition on @item().SinkFormat
→ "parquet" → Copy to Parquet dataset
→ "delta" → Copy to Delta dataset (or Databricks notebook)
→ "csv" → Copy to CSV dataset
Three parameterized sink datasets, one pipeline. Format controlled by metadata.
Pattern 2: Handling Schema Drift with Parameterized Queries
Instead of @dataset().TableName selecting the full table, use a query parameter for more control:
Dataset parameter: SourceQuery (String)
Copy Source tab:
Use query: Query
Query: @item().SourceQuery
Metadata Table:
| TableName | SourceQuery |
|-----------|-----------------------------------------------------|
| Customer | SELECT CustomerID, Name, Email FROM SalesLT.Customer|
| Product | SELECT * FROM SalesLT.Product WHERE Active = 1 |
| Orders | SELECT * FROM SalesLT.SalesOrderHeader |
Benefit: Control which columns and rows are extracted per table.
No schema drift issues — you explicitly define the output shape.
Pattern 3: Parameterized Incremental Loading
Metadata Table:
| TableName | WatermarkColumn | LastWatermark |
|-----------|-----------------|---------------------|
| Customer | ModifiedDate | 2026-06-11 00:00:00 |
| Orders | OrderDate | 2026-06-10 00:00:00 |
Copy Source query (parameterized):
@concat('SELECT * FROM ', item().SchemaName, '.', item().TableName,
' WHERE ', item().WatermarkColumn, ' > ''', item().LastWatermark, '''')
Result for Customer:
SELECT * FROM SalesLT.Customer WHERE ModifiedDate > '2026-06-11 00:00:00'
After copy: Stored Procedure updates LastWatermark to current max value.
Same pipeline handles full load AND incremental — controlled by metadata.
Common Errors
| Error | Cause | Fix |
|---|---|---|
BadRequest null |
Stale defaults in dataset parameters | Clear all defaults |
dataset function not defined |
Used @dataset() in pipeline | Change to @item() |
item function not defined |
Used @item() in dataset | Change to @dataset() |
| Wrong table copied | Case mismatch | Match exact column name case |
| Copy succeeds, 0 rows | Empty parameter value | Verify Lookup returns data |
Interview Questions
Q: What is a parameterized dataset? A: A dataset where properties like table name or file path come from parameters instead of being hardcoded. One dataset handles multiple tables dynamically.
Q: What’s the difference between @dataset() and @item()? A: @dataset() references dataset parameters inside the dataset. @item() references the current ForEach element inside the pipeline. Pipeline passes values to dataset.
Q: How many datasets for a 50-table metadata pipeline? A: Just 3 — one metadata lookup, one parameterized source, one parameterized sink.
Q: What happens if you use @item() inside a dataset? A: You get the error “item function not defined.” The @item() expression only works inside a pipeline ForEach loop. Datasets must use @dataset() to reference their own parameters. The pipeline passes values from @item() into the dataset parameters at runtime.
Q: Can you use pipeline parameters inside a dataset? A: Not directly. Datasets only understand @dataset(). To pass a pipeline parameter to a dataset, you chain them: the pipeline parameter is mapped to the dataset parameter in the Copy activity’s Source/Sink tab. For example, Copy Source → Dataset property SchemaName = @pipeline().parameters.Schema. Inside the dataset, @dataset().SchemaName reads that value.
Q: Why should you leave dataset parameter defaults empty? A: If you put a hardcoded value (like “SalesLT”) in the default, and the pipeline fails to pass a value, the dataset silently uses the default instead of failing. This masks bugs — you think the pipeline is working dynamically, but it is always reading from SalesLT. Empty defaults cause an immediate error if the value is missing, which is what you want.
Q: What is the difference between a variable and a parameter in ADF? A: Parameters are set once when the pipeline starts and cannot change during execution — they are inputs (source name, date range, environment). Variables can change during execution using Set Variable or Append Variable activities — they track state (row counts, error messages, status flags). Parameters use @pipeline().parameters.X. Variables use @variables(‘X’).
Wrapping Up
Parameterized datasets turn a 100-dataset nightmare into a 2-dataset solution. Master @dataset() vs @item(), follow the golden rules, and metadata-driven pipelines become second nature.
Related posts: – Metadata-Driven Pipeline in ADF – Common ADF/Synapse Errors – ADF vs Synapse Comparison – Top 15 ADF Interview Questions
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.