Understanding Parameterized Datasets in Azure Data Factory: A Deep Dive

Understanding Parameterized Datasets in Azure Data Factory: A Deep Dive

Parameterized datasets are what separate a beginner ADF developer from a production-ready one. Without them, you create one dataset per table — 50 tables means 50 datasets. With them, you create one dataset that handles all 50 tables dynamically.

Think of a parameterized dataset like a form letter template. A regular dataset is a letter with the recipient’s name, address, and greeting already filled in — one letter per person. A parameterized dataset is the template with blank fields: “Dear _____, your order for _____ ships to _____.” At runtime, the pipeline fills in the blanks for each iteration. One template, unlimited letters.

If you have built our metadata-driven pipeline, you have already used parameterized datasets. This post explains the mechanics behind what you built — so you can debug, extend, and build new patterns confidently.

Table of Contents

  • Why Parameterize? The Multiplication Problem
  • The Three Expression Scopes
  • Creating a Parameterized Source Dataset
  • Creating a Parameterized Sink Dataset
  • How Values Flow at Runtime
  • The Golden Rules
  • Pipeline Parameters vs Dataset Parameters
  • Variables vs Parameters
  • ADF vs Synapse UI Differences
  • Advanced: Dynamic File Names and Date Partitions
  • Real-World Patterns
  • Common Errors
  • Interview Questions
  • Wrapping Up

Why Parameterize? The Multiplication Problem

Without parameterization:

100 datasets (50 source + 50 sink) — one per table
Adding a new table = create 2 datasets + update pipeline

With parameterization:

2 datasets total — one parameterized source, one parameterized sink
Adding a new table = insert 1 row in metadata table

The math makes this obvious at scale:

Approach 10 Tables 50 Tables 200 Tables Adding 1 New Table
Hardcoded datasets 20 datasets 100 datasets 400 datasets Create 2 datasets + edit pipeline
Parameterized datasets 2 datasets 2 datasets 2 datasets Insert 1 row in metadata table

Parameterized datasets are not an optimization — they are the only sane approach for production. Every enterprise ADF pipeline uses them.

The Three Expression Scopes

1. Dataset Scope: @dataset()

Used inside a dataset’s Connection tab. References the dataset’s own parameters.

Where: Dataset Connection tab
Expression: @dataset().SchemaName

2. Pipeline/ForEach Scope: @item()

Used inside a pipeline activity within a ForEach loop.

Where: Pipeline Copy Activity Source/Sink tab
Expression: @item().SchemaName

3. Activity Scope: @activity()

Used inside a pipeline activity to reference another activity’s output.

Where: Pipeline — any activity referencing another
Expression: @activity('Lookup_Metadata').output.value

The Critical Rule

@dataset() is ONLY inside datasets. @item() and @activity() are ONLY inside pipelines. Cross them and you get errors.

WHERE EACH EXPRESSION LIVES:

┌─────────────────────────────────────────────────────────────────┐
│  PIPELINE (ForEach loop)                                        │
│                                                                 │
│  @item().SchemaName          ← current row from ForEach         │
│  @item().TableName           ← current row from ForEach         │
│  @activity('Lookup').output  ← output from another activity     │
│  @pipeline().parameters.X    ← pipeline-level parameter         │
│                                                                 │
│  Copy Activity Source tab:                                      │
│    Dataset: DS_SqlDB_SourceTable                                │
│    SchemaName: @item().SchemaName   ← passes value TO dataset   │
│    TableName:  @item().TableName    ← passes value TO dataset   │
│                                                                 │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  DATASET: DS_SqlDB_SourceTable                            │  │
│  │                                                           │  │
│  │  Parameters: SchemaName, TableName                        │  │
│  │                                                           │  │
│  │  Connection tab:                                          │  │
│  │    Schema: @dataset().SchemaName  ← reads its OWN param   │  │
│  │    Table:  @dataset().TableName   ← reads its OWN param   │  │
│  │                                                           │  │
│  │  ❌ @item() does NOT work here                            │  │
│  │  ❌ @activity() does NOT work here                        │  │
│  │  ❌ @pipeline() does NOT work here                        │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ❌ @dataset() does NOT work here (pipeline level)              │
└─────────────────────────────────────────────────────────────────┘

Think of it as a function call:

# Dataset = function definition
def read_from_sql(SchemaName, TableName):      # @dataset() parameters
    return query(f"SELECT * FROM {SchemaName}.{TableName}")

# Pipeline = function call
for row in metadata_rows:                       # ForEach iteration
    data = read_from_sql(
        SchemaName=row.SchemaName,              # @item().SchemaName
        TableName=row.TableName                 # @item().TableName
    )

Creating a Parameterized Source Dataset

Step 1: Create the Dataset

  1. Author tab > + > Dataset > Azure SQL Database
  2. Name: DS_SqlDB_SourceTable
  3. Linked service: LS_AzureSqlDB
  4. Leave table blank

Step 2: Add Parameters

  1. Parameters tab > add:
  2. SchemaName (String, empty default)
  3. TableName (String, empty default)

CRITICAL: Leave default values empty. This prevents BadRequest null errors.

Step 3: Configure Connection

  1. Connection tab > check “Edit” (ADF) or “Enter manually” (Synapse)
  2. Schema > dynamic content > @dataset().SchemaName
  3. Table > dynamic content > @dataset().TableName

JSON View

{
    "name": "DS_SqlDB_SourceTable",
    "properties": {
        "parameters": {
            "SchemaName": {"type": "string"},
            "TableName": {"type": "string"}
        },
        "type": "AzureSqlTable",
        "typeProperties": {
            "schema": {
                "value": "@dataset().SchemaName",
                "type": "Expression"
            },
            "table": {
                "value": "@dataset().TableName",
                "type": "Expression"
            }
        }
    }
}

Creating a Parameterized Sink Dataset

  1. + > Dataset > ADLS Gen2 > Parquet
  2. Name: DS_ADLS_Sink
  3. Parameters: ContainerName (String, empty), FolderName (String, empty)
  4. Connection tab:
  5. Container > @dataset().ContainerName
  6. Directory > @dataset().FolderName
  7. File name > leave blank

How Values Flow at Runtime

Step 1: Lookup reads metadata table
        Output: [{SchemaName: "SalesLT", TableName: "Customer", ...}]

Step 2: ForEach iterates — @item() = current row

Step 3: Copy Source passes to dataset:
        DS_SqlDB_SourceTable.SchemaName <- @item().SchemaName = "SalesLT"
        DS_SqlDB_SourceTable.TableName  <- @item().TableName  = "Customer"

Step 4: Dataset resolves:
        @dataset().SchemaName -> "SalesLT"
        @dataset().TableName  -> "Customer"
        Result: SELECT * FROM SalesLT.Customer

Step 5: Copy Sink passes:
        DS_ADLS_Sink.ContainerName <- @item().ContainerName = "database"
        DS_ADLS_Sink.FolderName    <- @item().FolderName    = "sqldb/Customer"

Step 6: Write to database/sqldb/Customer/part-00000.parquet

The Golden Rules

  1. Never put expressions in dataset parameter defaults. Leave empty. Always.
  2. Never use @dataset() in the pipeline. Use @item() or @pipeline().parameters.
  3. Never use @item() in datasets. Use @dataset() in Connection tabs.
  4. Parameter names are case-sensitive. TableName != tablename.
  5. No trailing whitespace on parameter names.
  6. Always publish before debugging.

Pipeline Parameters vs Dataset Parameters

Type Defined On Passed By Expression
Dataset parameter Dataset Parameters tab Copy activity Source/Sink @dataset().ParamName
Pipeline parameter Pipeline Parameters tab Trigger or manual run @pipeline().parameters.ParamName

You can chain them:

Trigger > Pipeline parameter > Copy activity > Dataset parameter

Variables vs Parameters

This is another common confusion. Parameters and variables look similar but serve different purposes:

Feature Parameter Variable
Where defined Pipeline or Dataset Pipeline only
Can change at runtime? No — set once when pipeline starts, immutable Yes — use Set Variable activity to update
Scope Entire pipeline or entire dataset Entire pipeline (or ForEach scope)
Set by Trigger, parent pipeline, or manual run Set Variable / Append Variable activity
Expression @pipeline().parameters.X @variables('X')
Use case Input values that do not change: environment name, date range, source config Values that accumulate or change: row counts, error messages, status flags
Example: Pipeline with both

Parameters (set at trigger time, immutable):
  SourceEnvironment = "production"
  LoadDate = "2026-06-12"

Variables (change during execution):
  TotalRowsCopied = 0          ← updated by Set Variable after each Copy
  FailedTables = []            ← appended by Append Variable on failure
  PipelineStatus = "Running"   ← updated to "Success" or "Failed" at the end

Key rule: You CANNOT use Set Variable inside a ForEach loop to accumulate
across iterations (each iteration gets its own copy of the variable).
Use Append Variable instead, or accumulate in a SQL audit table.

ADF vs Synapse UI Differences

Action ADF Studio Synapse Studio
Enable dynamic table Check “Edit” Check “Enter manually”
Dataset location Author > Datasets Data > Integration datasets
Dynamic content Click field > “Add dynamic content” Same, or click fx icon

Advanced: Dynamic File Names and Date Partitions

Dynamic File Names

@concat(item().TableName, '_', formatDateTime(utcnow(), 'yyyyMMdd'), '.parquet')

Result: Customer_20260405.parquet

Date-Partitioned Folders

@concat(item().FolderName, '/',
        formatDateTime(utcnow(), 'yyyy'), '/',
        formatDateTime(utcnow(), 'MM'), '/',
        formatDateTime(utcnow(), 'dd'))

Result: sqldb/Customer/2026/04/05/

Important: Put this in the pipeline Copy Sink tab, NOT in the dataset. Dataset Directory stays as @dataset().FolderName.

Real-World Patterns

Pattern 1: One Pipeline, Multiple Output Formats

The metadata table controls the output format per table — some tables land as Parquet, others as Delta, others as CSV:

Metadata Table:
  | TableName | SinkFormat | ContainerName |
  |-----------|-----------|---------------|
  | Customer  | parquet   | raw           |
  | Orders    | delta     | curated       |
  | Config    | csv       | reference     |

Pipeline: ForEach → If Condition on @item().SinkFormat
  → "parquet" → Copy to Parquet dataset
  → "delta"   → Copy to Delta dataset (or Databricks notebook)
  → "csv"     → Copy to CSV dataset

Three parameterized sink datasets, one pipeline. Format controlled by metadata.

Pattern 2: Handling Schema Drift with Parameterized Queries

Instead of @dataset().TableName selecting the full table, use a query parameter for more control:

Dataset parameter: SourceQuery (String)

Copy Source tab:
  Use query: Query
  Query: @item().SourceQuery

Metadata Table:
  | TableName | SourceQuery                                         |
  |-----------|-----------------------------------------------------|
  | Customer  | SELECT CustomerID, Name, Email FROM SalesLT.Customer|
  | Product   | SELECT * FROM SalesLT.Product WHERE Active = 1      |
  | Orders    | SELECT * FROM SalesLT.SalesOrderHeader              |

Benefit: Control which columns and rows are extracted per table.
No schema drift issues — you explicitly define the output shape.

Pattern 3: Parameterized Incremental Loading

Metadata Table:
  | TableName | WatermarkColumn | LastWatermark       |
  |-----------|-----------------|---------------------|
  | Customer  | ModifiedDate    | 2026-06-11 00:00:00 |
  | Orders    | OrderDate       | 2026-06-10 00:00:00 |

Copy Source query (parameterized):
  @concat('SELECT * FROM ', item().SchemaName, '.', item().TableName,
          ' WHERE ', item().WatermarkColumn, ' > ''', item().LastWatermark, '''')

Result for Customer:
  SELECT * FROM SalesLT.Customer WHERE ModifiedDate > '2026-06-11 00:00:00'

After copy: Stored Procedure updates LastWatermark to current max value.
Same pipeline handles full load AND incremental — controlled by metadata.

Common Errors

Error Cause Fix
BadRequest null Stale defaults in dataset parameters Clear all defaults
dataset function not defined Used @dataset() in pipeline Change to @item()
item function not defined Used @item() in dataset Change to @dataset()
Wrong table copied Case mismatch Match exact column name case
Copy succeeds, 0 rows Empty parameter value Verify Lookup returns data

Interview Questions

Q: What is a parameterized dataset? A: A dataset where properties like table name or file path come from parameters instead of being hardcoded. One dataset handles multiple tables dynamically.

Q: What’s the difference between @dataset() and @item()? A: @dataset() references dataset parameters inside the dataset. @item() references the current ForEach element inside the pipeline. Pipeline passes values to dataset.

Q: How many datasets for a 50-table metadata pipeline? A: Just 3 — one metadata lookup, one parameterized source, one parameterized sink.

Q: What happens if you use @item() inside a dataset? A: You get the error “item function not defined.” The @item() expression only works inside a pipeline ForEach loop. Datasets must use @dataset() to reference their own parameters. The pipeline passes values from @item() into the dataset parameters at runtime.

Q: Can you use pipeline parameters inside a dataset? A: Not directly. Datasets only understand @dataset(). To pass a pipeline parameter to a dataset, you chain them: the pipeline parameter is mapped to the dataset parameter in the Copy activity’s Source/Sink tab. For example, Copy Source → Dataset property SchemaName = @pipeline().parameters.Schema. Inside the dataset, @dataset().SchemaName reads that value.

Q: Why should you leave dataset parameter defaults empty? A: If you put a hardcoded value (like “SalesLT”) in the default, and the pipeline fails to pass a value, the dataset silently uses the default instead of failing. This masks bugs — you think the pipeline is working dynamically, but it is always reading from SalesLT. Empty defaults cause an immediate error if the value is missing, which is what you want.

Q: What is the difference between a variable and a parameter in ADF? A: Parameters are set once when the pipeline starts and cannot change during execution — they are inputs (source name, date range, environment). Variables can change during execution using Set Variable or Append Variable activities — they track state (row counts, error messages, status flags). Parameters use @pipeline().parameters.X. Variables use @variables(‘X’).

Wrapping Up

Parameterized datasets turn a 100-dataset nightmare into a 2-dataset solution. Master @dataset() vs @item(), follow the golden rules, and metadata-driven pipelines become second nature.

Related posts:Metadata-Driven Pipeline in ADFCommon ADF/Synapse ErrorsADF vs Synapse ComparisonTop 15 ADF Interview Questions


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link