Understanding Azure Data Factory JSON: Pipelines, Datasets, Linked Services, and Triggers Decoded

Understanding Azure Data Factory JSON: Pipelines, Datasets, Linked Services, and Triggers Decoded

Every time you drag an activity onto the ADF canvas, configure a dataset, or create a linked service — ADF is writing a JSON file behind the scenes. The visual UI is just a pretty face on top of JSON.

Why does this matter? Because when you connect ADF to Git, every resource becomes a JSON file in your repository. When CI/CD deploys to production, it deploys JSON. When something breaks and the UI gives you a cryptic error, opening the JSON often reveals the problem immediately.

Think of it this way: the ADF visual designer is like Google Maps showing you a city from above. The JSON is the actual street-level blueprint — every building, every road, every wire. You can navigate the city just fine with Maps, but when you need to fix a water pipe, you need the blueprint.

This post decodes every major JSON structure in ADF/Synapse so you can read, understand, and troubleshoot them with confidence.

Table of Contents

  • Why Learn JSON Structures?
  • How to View JSON in ADF Studio
  • The JSON Hierarchy
  • Linked Service JSON
  • Dataset JSON
  • Pipeline JSON (Simple Copy)
  • Pipeline JSON (Metadata-Driven with Lookup + ForEach)
  • Pipeline JSON (Incremental Load with Watermark)
  • Pipeline JSON (Audit Logging with Stored Procedures)
  • Pipeline JSON (Data Lake Cleanup with Get Metadata + If Condition)
  • Trigger JSON (Schedule, Tumbling Window, Event)
  • Integration Runtime JSON
  • Data Flow JSON
  • How JSON Maps to the UI
  • Common JSON Patterns
  • Editing JSON Directly
  • Troubleshooting with JSON
  • Interview Questions
  • Wrapping Up

Why Learn JSON Structures?

Scenario 1: Git Repository

When ADF is connected to Git, your repository looks like this:

adf-pipelines/
  pipeline/
    PL_Copy_SqlToADLS.json              <-- You need to understand this
    PL_IncrementalLoad.json
  dataset/
    DS_SqlDB_SourceTable.json           <-- And this
  linkedService/
    LS_AzureSqlDB.json                  <-- And this
  trigger/
    TR_Daily_2AM.json                   <-- And this

During code review (Pull Request), you review these JSON files. You cannot review effectively if you do not understand the structure.

Scenario 2: Troubleshooting

The UI says “BadRequest null.” You open the JSON and immediately see a dataset parameter with a stale default value. Problem solved in 30 seconds instead of 30 minutes.

Scenario 3: CI/CD

ARM templates are JSON. Parameter override files are JSON. Understanding the structure helps you write correct parameter files and debug deployment failures.

Real-life analogy: Learning JSON structures is like a mechanic learning to read engine blueprints. They can fix most things by looking at the engine, but for complex problems, they need to understand the blueprint to trace the issue to its source.

How to View JSON in ADF Studio

Every resource in ADF has a code view:

  1. Open any pipeline, dataset, linked service, or trigger
  2. Click the {} (curly braces) icon in the top-right corner
  3. The full JSON definition appears

You can edit the JSON directly here and switch back to the visual designer. Changes in one view are reflected in the other.

Shortcut: In Synapse Studio, click the </> icon instead.

The JSON Hierarchy

Every ADF resource follows the same top-level structure:

{
    "name": "ResourceName",
    "properties": {
        "type": "ResourceType",
        ... (type-specific properties)
    }
}

Think of it like a person’s ID card: – name = the person’s name – properties.type = their role (pipeline, dataset, linked service) – Everything inside properties = their specific details

Linked Service JSON

A Linked Service is a connection to a data store. Think of it as a saved WiFi password — it stores how to connect so you do not type it every time.

Azure SQL Database Linked Service

{
    "name": "LS_AzureSqlDB",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "integrated security=False;encrypt=True;connection timeout=30;data source=sql-dataplatform-dev.database.windows.net;initial catalog=AdventureWorksLT;user id=sqladmin"
            },
            "password": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "LS_KeyVault",
                    "type": "LinkedServiceReference"
                },
                "secretName": "sql-admin-password"
            }
        },
        "connectVia": {
            "referenceName": "AutoResolveIntegrationRuntime",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Breaking It Down

Section What It Means Real-Life Equivalent
name The linked service name WiFi network name
type: AzureSqlDatabase What type of data store WiFi type (2.4GHz vs 5GHz)
connectionString Server and database info WiFi address and channel
password (Key Vault) Secret stored securely WiFi password in a password manager
connectVia Which Integration Runtime to use Which network adapter connects

ADLS Gen2 Linked Service

{
    "name": "LS_ADLS_Gen2",
    "properties": {
        "type": "AzureBlobFS",
        "typeProperties": {
            "url": "https://naveendatalake.dfs.core.windows.net"
        },
        "connectVia": {
            "referenceName": "AutoResolveIntegrationRuntime",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Notice: no password. This linked service uses Managed Identity — ADF authenticates with its own Azure identity. The cleanest and most secure approach.

SQL Server via Self-Hosted IR

{
    "name": "LS_OnPrem_SqlServer",
    "properties": {
        "type": "SqlServer",
        "typeProperties": {
            "connectionString": "integrated security=False;data source=10.0.0.5;initial catalog=OnPremDB;user id=sqladmin",
            "password": {
                "type": "SecureString",
                "value": "**********"
            }
        },
        "connectVia": {
            "referenceName": "SHIR-OnPrem-SQL",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Key difference: connectVia points to SHIR-OnPrem-SQL instead of AutoResolveIntegrationRuntime. This tells ADF to use the Self-Hosted IR to reach the on-premises database.

Dataset JSON

A Dataset points to specific data within a linked service. Think of it as a bookmark in a book — the linked service is the book, the dataset is the specific page.

Non-Parameterized Dataset (Fixed Table)

{
    "name": "DS_SqlDB_Metadata",
    "properties": {
        "type": "AzureSqlTable",
        "linkedServiceName": {
            "referenceName": "LS_AzureSqlDB",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "schema": "dbo",
            "table": "metadata"
        }
    }
}

Simple: always points to dbo.metadata in Azure SQL Database.

Parameterized Dataset (Dynamic Table)

{
    "name": "DS_SqlDB_SourceTable",
    "properties": {
        "type": "AzureSqlTable",
        "linkedServiceName": {
            "referenceName": "LS_AzureSqlDB",
            "type": "LinkedServiceReference"
        },
        "parameters": {
            "SchemaName": { "type": "string" },
            "TableName": { "type": "string" }
        },
        "typeProperties": {
            "schema": {
                "value": "@dataset().SchemaName",
                "type": "Expression"
            },
            "table": {
                "value": "@dataset().TableName",
                "type": "Expression"
            }
        }
    }
}

Key JSON Patterns in Datasets

Static value:

"schema": "dbo"

Dynamic expression:

"schema": {
    "value": "@dataset().SchemaName",
    "type": "Expression"
}

When you see "type": "Expression", it means the value is calculated at runtime. When it is just a plain string, it is hardcoded.

Real-life analogy: A non-parameterized dataset is like a vending machine button labeled “Coke” — always dispenses the same thing. A parameterized dataset is like a customizable order screen — you choose what you want each time.

Parquet Sink Dataset (ADLS Gen2)

{
    "name": "DS_ADLS_Sink_Parquet",
    "properties": {
        "type": "Parquet",
        "linkedServiceName": {
            "referenceName": "LS_ADLS_Gen2",
            "type": "LinkedServiceReference"
        },
        "parameters": {
            "ContainerName": { "type": "string" },
            "FolderName": { "type": "string" }
        },
        "typeProperties": {
            "location": {
                "type": "AzureBlobFSLocation",
                "fileSystem": {
                    "value": "@dataset().ContainerName",
                    "type": "Expression"
                },
                "folderPath": {
                    "value": "@dataset().FolderName",
                    "type": "Expression"
                }
            },
            "compressionCodec": "snappy"
        }
    }
}

Notice "compressionCodec": "snappy" — this is why output files are .snappy.parquet.

Pipeline JSON (Simple Copy)

A pipeline with a single Copy activity:

{
    "name": "PL_Copy_Customer",
    "properties": {
        "activities": [
            {
                "name": "Copy_Customer",
                "type": "Copy",
                "inputs": [
                    {
                        "referenceName": "DS_SqlDB_Customer",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "DS_ADLS_Customer_Parquet",
                        "type": "DatasetReference"
                    }
                ],
                "typeProperties": {
                    "source": {
                        "type": "AzureSqlSource",
                        "sqlReaderQuery": "SELECT * FROM SalesLT.Customer"
                    },
                    "sink": {
                        "type": "ParquetSink",
                        "storeSettings": {
                            "type": "AzureBlobFSWriteSettings"
                        },
                        "formatSettings": {
                            "type": "ParquetWriteSettings"
                        }
                    }
                }
            }
        ]
    }
}

The Anatomy

Think of the pipeline JSON like a recipe card:

Recipe Name: "PL_Copy_Customer"
Ingredients (inputs): DS_SqlDB_Customer
Serving Dish (outputs): DS_ADLS_Customer_Parquet
Instructions (typeProperties):
  - Get from: Azure SQL (SELECT * FROM SalesLT.Customer)
  - Put into: Parquet file in ADLS

Pipeline JSON (Metadata-Driven with Lookup + ForEach)

This is the pattern from our metadata-driven pipeline post:

{
    "name": "PL_Copy_SqlToADLS",
    "properties": {
        "activities": [
            {
                "name": "Lookup_Metadata",
                "type": "Lookup",
                "typeProperties": {
                    "source": {
                        "type": "AzureSqlSource",
                        "sqlReaderQuery": "SELECT TableName, SchemaName, ContainerName, FolderName FROM dbo.metadata"
                    },
                    "dataset": {
                        "referenceName": "DS_SqlDB_Metadata",
                        "type": "DatasetReference"
                    },
                    "firstRowOnly": false
                }
            },
            {
                "name": "ForEach_Table",
                "type": "ForEach",
                "dependsOn": [
                    {
                        "activity": "Lookup_Metadata",
                        "dependencyConditions": ["Succeeded"]
                    }
                ],
                "typeProperties": {
                    "isSequential": false,
                    "batchCount": 5,
                    "items": {
                        "value": "@activity('Lookup_Metadata').output.value",
                        "type": "Expression"
                    },
                    "activities": [
                        {
                            "name": "Copy_TableData",
                            "type": "Copy",
                            "typeProperties": {
                                "source": {
                                    "type": "AzureSqlSource"
                                },
                                "sink": {
                                    "type": "ParquetSink"
                                }
                            },
                            "inputs": [
                                {
                                    "referenceName": "DS_SqlDB_SourceTable",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "SchemaName": {
                                            "value": "@item().SchemaName",
                                            "type": "Expression"
                                        },
                                        "TableName": {
                                            "value": "@item().TableName",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ],
                            "outputs": [
                                {
                                    "referenceName": "DS_ADLS_Sink_Parquet",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "ContainerName": {
                                            "value": "@item().ContainerName",
                                            "type": "Expression"
                                        },
                                        "FolderName": {
                                            "value": "@item().FolderName",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

Key Sections Decoded

dependsOn — this is the arrow you draw between activities:

"dependsOn": [
    {
        "activity": "Lookup_Metadata",
        "dependencyConditions": ["Succeeded"]
    }
]

Green arrow = "Succeeded". Red arrow = "Failed". Blue arrow = "Completed".

ForEach items — the expression that feeds the loop:

"items": {
    "value": "@activity('Lookup_Metadata').output.value",
    "type": "Expression"
}

Dataset parameters — how the pipeline passes values to parameterized datasets:

"parameters": {
    "SchemaName": {
        "value": "@item().SchemaName",
        "type": "Expression"
    }
}

This is the connection point between @item() in the pipeline and @dataset() in the dataset.

Real-life analogy: The pipeline JSON is like a choreography script for a dance. It lists every dancer (activity), their moves (typeProperties), who they wait for (dependsOn), and what props they use (inputs/outputs). The UI is the dance performance. The JSON is the script.

Pipeline JSON (Audit Logging with Stored Procedures)

Notice the two dependency branches after Copy — success and failure:

{
    "name": "Log_Success",
    "type": "SqlServerStoredProcedure",
    "dependsOn": [
        {
            "activity": "Copy_TableData",
            "dependencyConditions": ["Succeeded"]
        }
    ],
    "typeProperties": {
        "storedProcedureName": "sp_insert_audit_log",
        "storedProcedureParameters": {
            "rows_read": {
                "value": {
                    "value": "@activity('Copy_TableData').output.rowsRead",
                    "type": "Expression"
                },
                "type": "Int32"
            },
            "table_name": {
                "value": {
                    "value": "@item().TableName",
                    "type": "Expression"
                },
                "type": "String"
            },
            "error_message": {
                "value": "NA",
                "type": "String"
            }
        }
    }
},
{
    "name": "Log_Failure",
    "type": "SqlServerStoredProcedure",
    "dependsOn": [
        {
            "activity": "Copy_TableData",
            "dependencyConditions": ["Failed"]
        }
    ],
    "typeProperties": {
        "storedProcedureName": "sp_insert_audit_log",
        "storedProcedureParameters": {
            "rows_read": { "value": "0", "type": "Int32" },
            "error_message": {
                "value": {
                    "value": "@concat('Copy failed for ', item().SchemaName, '.', item().TableName)",
                    "type": "Expression"
                },
                "type": "String"
            }
        }
    }
}

Key insight: The dependencyConditions array is what creates the green and red arrows. ["Succeeded"] = green. ["Failed"] = red. This is why understanding JSON helps troubleshooting — if your failure handler is not running, check this value.

Trigger JSON

Schedule Trigger

{
    "name": "TR_Daily_2AM",
    "properties": {
        "type": "ScheduleTrigger",
        "typeProperties": {
            "recurrence": {
                "frequency": "Day",
                "interval": 1,
                "startTime": "2026-04-10T02:00:00Z",
                "timeZone": "Eastern Standard Time"
            }
        },
        "pipelines": [
            {
                "pipelineReference": {
                    "referenceName": "PL_Daily_ETL",
                    "type": "PipelineReference"
                },
                "parameters": {
                    "Environment": "production"
                }
            }
        ]
    }
}

Tumbling Window Trigger

{
    "name": "TR_Hourly_Window",
    "properties": {
        "type": "TumblingWindowTrigger",
        "typeProperties": {
            "frequency": "Hour",
            "interval": 1,
            "startTime": "2026-04-01T00:00:00Z",
            "maxConcurrency": 1,
            "retryPolicy": {
                "count": 3,
                "intervalInSeconds": 900
            }
        },
        "pipeline": {
            "pipelineReference": {
                "referenceName": "PL_Hourly_Incremental",
                "type": "PipelineReference"
            },
            "parameters": {
                "windowStart": "@trigger().outputs.windowStartTime",
                "windowEnd": "@trigger().outputs.windowEndTime"
            }
        }
    }
}

Notice the parameters section — this is where trigger outputs are mapped to pipeline parameters.

Storage Event Trigger

{
    "name": "TR_NewFile",
    "properties": {
        "type": "BlobEventsTrigger",
        "typeProperties": {
            "blobPathBeginsWith": "/datalake/blobs/input/",
            "blobPathEndsWith": ".parquet",
            "ignoreEmptyBlobs": true,
            "scope": "/subscriptions/.../storageAccounts/naveendatalake",
            "events": ["Microsoft.Storage.BlobCreated"]
        },
        "pipelines": [
            {
                "pipelineReference": {
                    "referenceName": "PL_Process_File",
                    "type": "PipelineReference"
                },
                "parameters": {
                    "TriggerFileName": "@trigger().outputs.body.fileName",
                    "TriggerFolderPath": "@trigger().outputs.body.folderPath"
                }
            }
        ]
    }
}

Integration Runtime JSON

Self-Hosted IR

{
    "name": "SHIR-OnPrem-SQL",
    "properties": {
        "type": "SelfHosted",
        "typeProperties": {}
    }
}

Surprisingly simple. The actual configuration (authentication keys, node registrations) lives in Azure, not in the JSON. The JSON just declares that the IR exists.

Custom Azure IR

{
    "name": "IR_Azure_EastUS",
    "properties": {
        "type": "Managed",
        "typeProperties": {
            "computeProperties": {
                "location": "East US",
                "dataFlowProperties": {
                    "computeType": "General",
                    "coreCount": 8,
                    "timeToLive": 10
                }
            }
        }
    }
}

The timeToLive: 10 keeps the Spark cluster warm for 10 minutes after a Data Flow run.

How JSON Maps to the UI

UI Element JSON Location
Activity name activities[].name
Activity type activities[].type (Copy, Lookup, ForEach, etc.)
Green arrow dependsOn[].dependencyConditions: ["Succeeded"]
Red arrow dependsOn[].dependencyConditions: ["Failed"]
Blue arrow dependsOn[].dependencyConditions: ["Completed"]
Dataset parameter value inputs[].parameters.ParamName.value
ForEach items typeProperties.items.value
Sequential toggle typeProperties.isSequential
Batch count typeProperties.batchCount
Source query typeProperties.source.sqlReaderQuery
First row only toggle typeProperties.firstRowOnly
Linked service linkedServiceName.referenceName
Integration runtime connectVia.referenceName

Common JSON Patterns

Static Value

"schema": "dbo"

Expression Value

"schema": {
    "value": "@dataset().SchemaName",
    "type": "Expression"
}

Dependency (Arrow)

"dependsOn": [
    { "activity": "PreviousActivity", "dependencyConditions": ["Succeeded"] }
]

No Dependencies (First Activity)

"dependsOn": []

Multiple Dependencies

"dependsOn": [
    { "activity": "Activity_A", "dependencyConditions": ["Succeeded"] },
    { "activity": "Activity_B", "dependencyConditions": ["Succeeded"] }
]

Activity only runs when BOTH A and B succeed.

Dataset Reference with Parameters

{
    "referenceName": "DS_SqlDB_SourceTable",
    "type": "DatasetReference",
    "parameters": {
        "SchemaName": { "value": "@item().SchemaName", "type": "Expression" },
        "TableName": { "value": "@item().TableName", "type": "Expression" }
    }
}

Editing JSON Directly

When to Edit JSON

  • Bulk rename: Find and replace across all activity names
  • Copy activities between pipelines: Copy the activity JSON block from one pipeline and paste into another
  • Fix stale references: Search for old dataset or linked service names
  • Debug BadRequest null: Look for empty or stale defaultValue in parameters

How to Edit

  1. Click {} to open code view
  2. Make your changes
  3. Click the visual designer icon to switch back
  4. ADF validates your JSON automatically
  5. If valid, your changes appear in the visual designer

Warning: Invalid JSON (missing comma, unclosed bracket) will prevent switching back to the visual designer. Always validate before switching.

Troubleshooting with JSON

Problem: BadRequest null

Open every dataset JSON and search for defaultValue:

"parameters": {
    "SchemaName": {
        "type": "string",
        "defaultValue": "@pipeline().parameters.OldParam"
    }
}

If defaultValue references something that no longer exists, remove it:

"parameters": {
    "SchemaName": { "type": "string" }
}

Problem: Activity Not Running

Check dependsOn. If it references an activity name that does not exist or has a typo:

"dependsOn": [
    { "activity": "Looup_Metadata", "dependencyConditions": ["Succeeded"] }
]

Notice Looup instead of Lookup. Fix the name.

Problem: Expression Error

Search for "type": "Expression" and verify each expression:

"value": "@item().tablename"    // Wrong case
"value": "@item().TableName"    // Correct case

Problem: Wrong Integration Runtime

Check the linked service JSON for connectVia:

"connectVia": {
    "referenceName": "AutoResolveIntegrationRuntime"
}

If this should be using the Self-Hosted IR, change to:

"connectVia": {
    "referenceName": "SHIR-OnPrem-SQL"
}

Interview Questions

Q: What happens when you connect ADF to Git? A: Every ADF resource (pipeline, dataset, linked service, trigger, IR) is saved as a JSON file in the Git repository. The UI is a visual representation of these JSON files. Changes in the UI create Git commits, and changes in Git update the UI.

Q: How do you identify a parameterized value vs a static value in JSON? A: Static values are plain strings or numbers. Parameterized values have "type": "Expression" alongside the value. For example: {"value": "@item().TableName", "type": "Expression"} is dynamic, while "table": "Customer" is static.

Q: What does the dependsOn section represent? A: It represents the arrows between activities in the visual designer. dependencyConditions of “Succeeded” = green arrow, “Failed” = red arrow, “Completed” = blue arrow. An activity with empty dependsOn runs first (no upstream dependency).

Q: How would you troubleshoot a BadRequest null error using JSON? A: Open every dataset JSON and search for defaultValue in parameters. If any defaultValue references a deleted pipeline parameter or contains a stale expression, clear it. Also check activity names in expressions for typos.

Q: What is the difference between pipeline JSON in ADF vs Synapse? A: The structure is identical. The only differences are in the ARM template file names (ARMTemplateForFactory.json vs TemplateForWorkspace.json) and the publish branch name (adf_publish vs workspace_publish). The pipeline, dataset, and linked service JSON is the same.

Wrapping Up

Understanding ADF JSON transforms you from a UI-only developer into someone who can troubleshoot, code review, and work with CI/CD confidently. Every click in the UI writes JSON. Every JSON change appears in the UI.

The next time you see a cryptic error, open the JSON. The answer is usually right there — a typo in an activity name, a stale default value, or a wrong dependency condition. The JSON does not lie.

Related posts:ADF Expressions GuideParameterized Datasets Deep DiveCI/CD with GitHubCI/CD with Azure DevOpsCommon ADF/Synapse Errors


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link