Understanding Parameterized Datasets in Azure Data Factory: A Deep Dive

Understanding Parameterized Datasets in Azure Data Factory: A Deep Dive

Parameterized datasets are what separate a beginner ADF developer from a production-ready one. Without them, you create one dataset per table — 50 tables means 50 datasets. With them, you create one dataset that handles all 50 tables dynamically.

Why Parameterize? The Multiplication Problem

Without parameterization:

100 datasets (50 source + 50 sink) — one per table
Adding a new table = create 2 datasets + update pipeline

With parameterization:

2 datasets total — one parameterized source, one parameterized sink
Adding a new table = insert 1 row in metadata table

The Three Expression Scopes

1. Dataset Scope: @dataset()

Used inside a dataset’s Connection tab. References the dataset’s own parameters.

Where: Dataset Connection tab
Expression: @dataset().SchemaName

2. Pipeline/ForEach Scope: @item()

Used inside a pipeline activity within a ForEach loop.

Where: Pipeline Copy Activity Source/Sink tab
Expression: @item().SchemaName

3. Activity Scope: @activity()

Used inside a pipeline activity to reference another activity’s output.

Where: Pipeline — any activity referencing another
Expression: @activity('Lookup_Metadata').output.value

The Critical Rule

@dataset() is ONLY inside datasets. @item() and @activity() are ONLY inside pipelines. Cross them and you get errors.

Think of it as a function call:

# Dataset = function definition
def read_from_sql(SchemaName, TableName):      # @dataset() parameters
    return query(f"SELECT * FROM {SchemaName}.{TableName}")

# Pipeline = function call
for row in metadata_rows:                       # ForEach iteration
    data = read_from_sql(
        SchemaName=row.SchemaName,              # @item().SchemaName
        TableName=row.TableName                 # @item().TableName
    )

Creating a Parameterized Source Dataset

Step 1: Create the Dataset

  1. Author tab > + > Dataset > Azure SQL Database
  2. Name: DS_SqlDB_SourceTable
  3. Linked service: LS_AzureSqlDB
  4. Leave table blank

Step 2: Add Parameters

  1. Parameters tab > add:
  2. SchemaName (String, empty default)
  3. TableName (String, empty default)

CRITICAL: Leave default values empty. This prevents BadRequest null errors.

Step 3: Configure Connection

  1. Connection tab > check “Edit” (ADF) or “Enter manually” (Synapse)
  2. Schema > dynamic content > @dataset().SchemaName
  3. Table > dynamic content > @dataset().TableName

JSON View

{
    "name": "DS_SqlDB_SourceTable",
    "properties": {
        "parameters": {
            "SchemaName": {"type": "string"},
            "TableName": {"type": "string"}
        },
        "type": "AzureSqlTable",
        "typeProperties": {
            "schema": {
                "value": "@dataset().SchemaName",
                "type": "Expression"
            },
            "table": {
                "value": "@dataset().TableName",
                "type": "Expression"
            }
        }
    }
}

Creating a Parameterized Sink Dataset

  1. + > Dataset > ADLS Gen2 > Parquet
  2. Name: DS_ADLS_Sink
  3. Parameters: ContainerName (String, empty), FolderName (String, empty)
  4. Connection tab:
  5. Container > @dataset().ContainerName
  6. Directory > @dataset().FolderName
  7. File name > leave blank

How Values Flow at Runtime

Step 1: Lookup reads metadata table
        Output: [{SchemaName: "SalesLT", TableName: "Customer", ...}]

Step 2: ForEach iterates — @item() = current row

Step 3: Copy Source passes to dataset:
        DS_SqlDB_SourceTable.SchemaName <- @item().SchemaName = "SalesLT"
        DS_SqlDB_SourceTable.TableName  <- @item().TableName  = "Customer"

Step 4: Dataset resolves:
        @dataset().SchemaName -> "SalesLT"
        @dataset().TableName  -> "Customer"
        Result: SELECT * FROM SalesLT.Customer

Step 5: Copy Sink passes:
        DS_ADLS_Sink.ContainerName <- @item().ContainerName = "database"
        DS_ADLS_Sink.FolderName    <- @item().FolderName    = "sqldb/Customer"

Step 6: Write to database/sqldb/Customer/part-00000.parquet

The Golden Rules

  1. Never put expressions in dataset parameter defaults. Leave empty. Always.
  2. Never use @dataset() in the pipeline. Use @item() or @pipeline().parameters.
  3. Never use @item() in datasets. Use @dataset() in Connection tabs.
  4. Parameter names are case-sensitive. TableName != tablename.
  5. No trailing whitespace on parameter names.
  6. Always publish before debugging.

Pipeline Parameters vs Dataset Parameters

Type Defined On Passed By Expression
Dataset parameter Dataset Parameters tab Copy activity Source/Sink @dataset().ParamName
Pipeline parameter Pipeline Parameters tab Trigger or manual run @pipeline().parameters.ParamName

You can chain them:

Trigger > Pipeline parameter > Copy activity > Dataset parameter

ADF vs Synapse UI Differences

Action ADF Studio Synapse Studio
Enable dynamic table Check “Edit” Check “Enter manually”
Dataset location Author > Datasets Data > Integration datasets
Dynamic content Click field > “Add dynamic content” Same, or click fx icon

Advanced: Dynamic File Names and Date Partitions

Dynamic File Names

@concat(item().TableName, '_', formatDateTime(utcnow(), 'yyyyMMdd'), '.parquet')

Result: Customer_20260405.parquet

Date-Partitioned Folders

@concat(item().FolderName, '/',
        formatDateTime(utcnow(), 'yyyy'), '/',
        formatDateTime(utcnow(), 'MM'), '/',
        formatDateTime(utcnow(), 'dd'))

Result: sqldb/Customer/2026/04/05/

Important: Put this in the pipeline Copy Sink tab, NOT in the dataset. Dataset Directory stays as @dataset().FolderName.

Common Errors

Error Cause Fix
BadRequest null Stale defaults in dataset parameters Clear all defaults
dataset function not defined Used @dataset() in pipeline Change to @item()
item function not defined Used @item() in dataset Change to @dataset()
Wrong table copied Case mismatch Match exact column name case
Copy succeeds, 0 rows Empty parameter value Verify Lookup returns data

Interview Questions

Q: What is a parameterized dataset? A: A dataset where properties like table name or file path come from parameters instead of being hardcoded. One dataset handles multiple tables dynamically.

Q: What’s the difference between @dataset() and @item()? A: @dataset() references dataset parameters inside the dataset. @item() references the current ForEach element inside the pipeline. Pipeline passes values to dataset.

Q: How many datasets for a 50-table metadata pipeline? A: Just 3 — one metadata lookup, one parameterized source, one parameterized sink.

Wrapping Up

Parameterized datasets turn a 100-dataset nightmare into a 2-dataset solution. Master @dataset() vs @item(), follow the golden rules, and metadata-driven pipelines become second nature.

Related posts:Metadata-Driven Pipeline in ADFCommon ADF/Synapse ErrorsADF vs Synapse ComparisonTop 15 ADF Interview Questions


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link