CI/CD for Azure Data Factory and Synapse: ARM Templates, Environment Promotion, and the Complete Hands-On Guide

CI/CD for Azure Data Factory and Synapse: ARM Templates, Environment Promotion, and the Complete Hands-On Guide

You built 10 pipelines in Synapse Dev Studio. They work perfectly. Now the project manager says: “Deploy this to UAT by Friday and Production by next Tuesday.” You open the UAT Synapse workspace and start recreating pipelines manually — one by one, clicking through the UI, hoping you do not miss a parameter.

That is NOT how production teams work. In real projects, you NEVER manually recreate pipelines. You deploy them using ARM templates — the same template, different parameters, automated via CI/CD. One click (or zero clicks with automation) and everything appears in the target environment, exactly as it was in Dev.

This post is the complete hands-on guide. Not theory — actual files, actual commands, actual YAML. By the end, you will have Git integration set up, ARM templates generated, parameter files for each environment, and a CI/CD pipeline that deploys your work from Dev to UAT to Production automatically.

Think of ARM templates like a house blueprint. The blueprint (template) defines the layout — rooms, doors, windows. The spec sheet (parameters) defines the materials — marble countertops for the luxury build (Prod), laminate for the basic build (Dev). Same blueprint, different materials, different houses. You never redesign the house for each client — you just swap the spec sheet.

Table of Contents

  • Why CI/CD for Data Pipelines
  • The End-to-End Flow (Architecture Diagram)
  • What an ARM Template Actually Is
  • Inside the Template: What Gets Exported
  • Inside the Parameters File: What Changes Per Environment
  • Step 1: Create the GitHub Repository
  • Step 2: Connect Dev Workspace to GitHub
  • Step 3: Feature Branch Workflow
  • Step 4: Publish to Generate ARM Templates
  • Step 5: Understand workspace_publish Branch
  • Step 6: Create Environment Parameter Files
  • Step 7: Create a Service Principal for Deployment
  • Step 8: Pre/Post Deployment Scripts (Stop/Start Triggers)
  • Step 9: CI/CD with GitHub Actions (Complete YAML)
  • Step 10: CI/CD with Azure DevOps (Complete YAML)
  • How Our Pipelines Map to Git JSON Files
  • Multi-Subscription Setup (Real Enterprise)
  • Rollback: What Happens When a Deployment Goes Wrong
  • ADF vs Synapse: CI/CD Differences
  • Common Mistakes
  • Interview Questions
  • Wrapping Up

Why CI/CD for Data Pipelines

Without CI/CD

Developer builds pipeline in Dev UI
Developer screenshots every setting
Developer opens UAT workspace
Developer recreates everything by clicking through UI
Developer misses one parameter → pipeline fails at 2 AM
Manager asks "what changed between yesterday and today?" → nobody knows
Rollback = "does anyone remember what it looked like before?"

With CI/CD

Developer builds pipeline in Dev UI (connected to Git)
Developer creates Pull Request → team reviews code
PR merged → clicks Publish → ARM templates auto-generated
CI/CD triggers → deploys to UAT with UAT parameters → automated tests
Approval gate → deploys to Prod with Prod parameters
Rollback = git revert (30 seconds, full audit trail)
All changes tracked in Git with who, what, when

The End-to-End Flow (Architecture Diagram)

GitHub Repository
                              |
          ┌───────────────────┼───────────────────┐
          |                   |                   |
     main branch        workspace_publish    feature branches
    (approved code)      (ARM templates)     (developer work)
          |                   |
          |            ┌──────┴──────┐
          |            |             |
          |     GitHub Actions  Azure DevOps
          |     (CI/CD Pipeline)
          |            |
          |     ┌──────┴──────────────────┐
          |     |                         |
          |  Deploy to UAT            Deploy to Prod
          |  (auto on publish)        (manual approval)
          |     |                         |
          |  uat.parameters.json     prod.parameters.json
          |     |                         |
     ┌────┴────┐                    ┌────┴────┐
     | UAT      |                    | PROD     |
     | Synapse  |                    | Synapse  |
     | Workspace|                    | Workspace|
     | (test)   |                    | (live)   |
     └─────────┘                    └─────────┘

What an ARM Template Actually Is

ARM (Azure Resource Manager) templates are JSON files that describe Azure resources. For ADF/Synapse, the ARM template contains EVERY resource in your workspace: all pipelines, datasets, linked services, triggers, data flows, and their configurations.

When you click Publish in Synapse Studio, it serializes your entire workspace into these JSON files.

The Two Files

workspace_publish/
├── TemplateForWorkspace.json              ← All resources defined
└── TemplateParametersForWorkspace.json    ← Values that change per environment

Real-life analogy: The Template is a cooking recipe — exact steps, ingredients, measurements. The Parameters file is the grocery list — where to buy each ingredient. The recipe stays the same. The grocery list changes depending on which city you are in (Dev, UAT, Prod).

Inside the Template: What Gets Exported

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "workspaceName": { "type": "string" },
        "LS_AzureSqlDatabase_connectionString": { "type": "secureString" },
        "LS_ADLS_url": { "type": "string" }
    },
    "resources": [
        {
            "type": "Microsoft.Synapse/workspaces/linkedservices",
            "name": "[concat(parameters('workspaceName'), '/LS_AzureSqlDatabase')]",
            "properties": {
                "type": "AzureSqlDatabase",
                "typeProperties": {
                    "connectionString": "[parameters('LS_AzureSqlDatabase_connectionString')]"
                }
            }
        },
        {
            "type": "Microsoft.Synapse/workspaces/datasets",
            "name": "[concat(parameters('workspaceName'), '/DS_SourceTable_Dynamic')]",
            "properties": {
                "type": "AzureSqlTable",
                "linkedServiceName": {
                    "referenceName": "LS_AzureSqlDatabase"
                },
                "typeProperties": {
                    "schema": { "type": "Expression", "value": "@dataset().SchemaName" },
                    "table": { "type": "Expression", "value": "@dataset().TableName" }
                },
                "parameters": {
                    "SchemaName": { "type": "string" },
                    "TableName": { "type": "string" }
                }
            }
        },
        {
            "type": "Microsoft.Synapse/workspaces/pipelines",
            "name": "[concat(parameters('workspaceName'), '/PL_MetadataDrivenLoad')]",
            "properties": {
                "activities": [
                    {
                        "name": "Lookup_Config",
                        "type": "Lookup",
                        "typeProperties": {
                            "source": { "type": "AzureSqlSource", "sqlReaderQuery": "SELECT * FROM CONFIGTABLE_V2" },
                            "firstRowOnly": false
                        }
                    }
                ]
            }
        }
    ]
}

Key point: Every linked service, dataset, pipeline, and trigger becomes a JSON resource. You never write this by hand — Synapse generates it when you click Publish.

Inside the Parameters File: What Changes Per Environment

The auto-generated TemplateParametersForWorkspace.json has Dev values:

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "workspaceName": {
            "value": "naveen-synapse-ws-dev"
        },
        "LS_AzureSqlDatabase_connectionString": {
            "value": "Server=sql-dev.database.windows.net;Database=AdventureWorksLT-dev;..."
        },
        "LS_ADLS_url": {
            "value": "https://devstorageaccount.dfs.core.windows.net"
        },
        "LS_KeyVault_baseUrl": {
            "value": "https://dev-keyvault.vault.azure.net/"
        }
    }
}

You create separate files for UAT and Prod by copying this and changing the values:

uat.parameters.json

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "workspaceName": {
            "value": "naveen-synapse-ws-uat"
        },
        "LS_AzureSqlDatabase_connectionString": {
            "value": "Server=sql-uat.database.windows.net;Database=AdventureWorksLT-uat;..."
        },
        "LS_ADLS_url": {
            "value": "https://uatstorageaccount.dfs.core.windows.net"
        },
        "LS_KeyVault_baseUrl": {
            "value": "https://uat-keyvault.vault.azure.net/"
        }
    }
}

prod.parameters.json

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "workspaceName": {
            "value": "naveen-synapse-ws-prod"
        },
        "LS_AzureSqlDatabase_connectionString": {
            "value": "Server=sql-prod.database.windows.net;Database=AdventureWorksLT-prod;..."
        },
        "LS_ADLS_url": {
            "value": "https://prodstorageaccount.dfs.core.windows.net"
        },
        "LS_KeyVault_baseUrl": {
            "value": "https://prod-keyvault.vault.azure.net/"
        }
    }
}

Same template. Different parameters. Three different environments.

Step 1: Create the GitHub Repository

# Create a repo (or use GitHub UI)
# Name: naveen-synapse-pipelines
# Visibility: Private

On GitHub: 1. Click New repository 2. Name: naveen-synapse-pipelines 3. Private 4. Initialize with README 5. Create

Step 2: Connect Dev Workspace to GitHub

In Synapse Studio

  1. Open Synapse StudioManage (toolbox icon in sidebar)
  2. Click Git configuration (under Source control)
  3. Click Configure
  4. Repository type: GitHub
  5. GitHub account: select your account (authorize if prompted)
  6. Repository name: naveen-synapse-pipelines
  7. Collaboration branch: main
  8. Publish branch: workspace_publish
  9. Root folder: /
  10. Import existing resources: Yes
  11. Click Apply

In ADF Studio

  1. Open ADF StudioManageGit configuration
  2. Same steps as above, but publish branch is adf_publish

After connecting, your workspace resources (pipelines, datasets, linked services) are automatically committed to the main branch.

Real-life analogy: Connecting to Git is like turning on the security camera system in an office. From this moment forward, every change is recorded — who opened which door (edited which pipeline), when, and what they did. Before Git, there were no cameras.

Step 3: Feature Branch Workflow

Never Work Directly on main

Developer: "I need to add the incremental load pipeline"

1. In Synapse Studio, click the branch dropdown (top-left)
2. Click "New branch"
3. Name: feature/incremental-load
4. Base: main
5. Build and test the pipeline in this branch
6. When ready → commit changes
7. Create Pull Request on GitHub
8. Team reviews → Approves → Merge to main

What Happens in Git

main:                    A ──── B ──── C ──── D (merge) ──── E (publish)
                                       |      |
feature/incremental:                   C' ── C'' (your work)

Each save in Synapse Studio creates a commit on your feature branch. The PR merges your work into main. Only main is published.

Step 4: Publish to Generate ARM Templates

After merging your PR to main:

  1. Switch to main branch in Synapse Studio
  2. Click Publish (top toolbar)
  3. Review the changes shown in the publish dialog
  4. Click OK

Synapse generates ARM templates and pushes them to the workspace_publish branch.

Important: Only Publish from the main branch. Never from a feature branch.

Step 5: Understand workspace_publish Branch

After publishing, the workspace_publish branch contains:

workspace_publish/
├── TemplateForWorkspace.json              (12,000+ lines for a real workspace)
├── TemplateParametersForWorkspace.json    (environment-specific values)
└── (no other files — this branch is auto-generated)

Rules: – Never edit this branch manually – Never merge this branch into main – It is auto-generated by the Publish button – CI/CD pipelines read from this branch

For ADF, the branch is called adf_publish instead of workspace_publish.

Step 6: Create Environment Parameter Files

Copy TemplateParametersForWorkspace.json twice and modify:

project-root/
├── cicd/
│   ├── uat.parameters.json         ← UAT connection values
│   ├── prod.parameters.json        ← Prod connection values
│   └── pre-post-deployment.ps1     ← Stop/start triggers script
└── README.md

Commit these to the main branch (not workspace_publish).

What Parameters Typically Change

Parameter Dev UAT Prod
Workspace name synapse-ws-dev synapse-ws-uat synapse-ws-prod
SQL Server sql-dev.database.windows.net sql-uat.database.windows.net sql-prod.database.windows.net
Database name AdventureWorksLT-dev AdventureWorksLT-uat AdventureWorksLT-prod
ADLS URL devstorageaccount.dfs... uatstorageaccount.dfs... prodstorageaccount.dfs...
Key Vault URL dev-keyvault.vault... uat-keyvault.vault... prod-keyvault.vault...
Resource Group rg-dataplatform-dev rg-dataplatform-uat rg-dataplatform-prod
Subscription ID Same or different Same or different Often different

Step 7: Create a Service Principal for Deployment

CI/CD needs an identity to deploy to Azure. You create a Service Principal (like a robot account):

Create in Azure CLI

# Create Service Principal with Contributor role on UAT resource group
az ad sp create-for-rbac     --name "sp-synapse-cicd"     --role Contributor     --scopes /subscriptions/<sub-id>/resourceGroups/rg-dataplatform-uat              /subscriptions/<sub-id>/resourceGroups/rg-dataplatform-prod

# Output:
# {
#   "clientId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
#   "clientSecret": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
#   "subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
#   "tenantId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
# }

Store as GitHub Secret

  1. GitHub repo → SettingsSecrets and variablesActions
  2. Click New repository secret
  3. Name: AZURE_CREDENTIALS
  4. Value: paste the entire JSON output from above
  5. Click Add secret

Store in Azure DevOps

  1. Azure DevOps → Project SettingsService connections
  2. New service connectionAzure Resource ManagerService principal (manual)
  3. Enter the client ID, secret, subscription, and tenant

Step 8: Pre/Post Deployment Scripts (Stop/Start Triggers)

Before deploying ARM templates, you MUST stop active triggers in the target workspace. Otherwise, pipelines might run during deployment with half-old, half-new configuration.

PowerShell Script (pre-post-deployment.ps1)

param(
    [string]$ResourceGroupName,
    [string]$WorkspaceName,
    [string]$Action  # "stop" or "start"
)

if ($Action -eq "stop") {
    Write-Host "Stopping triggers in $WorkspaceName..."
    $triggers = Get-AzSynapseTrigger -WorkspaceName $WorkspaceName
    foreach ($trigger in $triggers) {
        if ($trigger.Properties.RuntimeState -eq "Started") {
            Stop-AzSynapseTrigger -WorkspaceName $WorkspaceName -Name $trigger.Name
            Write-Host "  Stopped: $($trigger.Name)"
        }
    }
    Write-Host "All triggers stopped."
}

if ($Action -eq "start") {
    Write-Host "Starting triggers in $WorkspaceName..."
    $triggers = Get-AzSynapseTrigger -WorkspaceName $WorkspaceName
    foreach ($trigger in $triggers) {
        Start-AzSynapseTrigger -WorkspaceName $WorkspaceName -Name $trigger.Name
        Write-Host "  Started: $($trigger.Name)"
    }
    Write-Host "All triggers started."
}

Deployment Order

1. Stop triggers in target workspace (pre-deployment)
2. Deploy ARM template with environment parameters
3. Start triggers in target workspace (post-deployment)

Step 9: CI/CD with GitHub Actions (Complete YAML)

# .github/workflows/synapse-deploy.yml
name: Synapse CI/CD

on:
  push:
    branches:
      - workspace_publish

permissions:
  id-token: write
  contents: read

jobs:
  deploy-uat:
    runs-on: ubuntu-latest
    environment: UAT
    steps:
      - name: Checkout workspace_publish branch
        uses: actions/checkout@v4
        with:
          ref: workspace_publish

      - name: Checkout main for parameter files
        uses: actions/checkout@v4
        with:
          ref: main
          path: main-branch

      - name: Azure Login
        uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Stop UAT Triggers (Pre-deployment)
        uses: azure/powershell@v2
        with:
          inlineScript: |
            ./main-branch/cicd/pre-post-deployment.ps1 `
              -ResourceGroupName "rg-dataplatform-uat" `
              -WorkspaceName "naveen-synapse-ws-uat" `
              -Action "stop"
          azPSVersion: latest

      - name: Deploy ARM Template to UAT
        uses: azure/arm-deploy@v2
        with:
          subscriptionId: ${{ secrets.AZURE_SUBSCRIPTION_UAT }}
          resourceGroupName: rg-dataplatform-uat
          template: ./TemplateForWorkspace.json
          parameters: ./main-branch/cicd/uat.parameters.json

      - name: Start UAT Triggers (Post-deployment)
        uses: azure/powershell@v2
        with:
          inlineScript: |
            ./main-branch/cicd/pre-post-deployment.ps1 `
              -ResourceGroupName "rg-dataplatform-uat" `
              -WorkspaceName "naveen-synapse-ws-uat" `
              -Action "start"
          azPSVersion: latest

      - name: UAT Deployment Complete
        run: echo "UAT deployment successful!"

  deploy-prod:
    runs-on: ubuntu-latest
    needs: deploy-uat
    environment: Production    # Requires manual approval in GitHub
    steps:
      - name: Checkout workspace_publish branch
        uses: actions/checkout@v4
        with:
          ref: workspace_publish

      - name: Checkout main for parameter files
        uses: actions/checkout@v4
        with:
          ref: main
          path: main-branch

      - name: Azure Login
        uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Stop Prod Triggers
        uses: azure/powershell@v2
        with:
          inlineScript: |
            ./main-branch/cicd/pre-post-deployment.ps1 `
              -ResourceGroupName "rg-dataplatform-prod" `
              -WorkspaceName "naveen-synapse-ws-prod" `
              -Action "stop"
          azPSVersion: latest

      - name: Deploy ARM Template to Production
        uses: azure/arm-deploy@v2
        with:
          subscriptionId: ${{ secrets.AZURE_SUBSCRIPTION_PROD }}
          resourceGroupName: rg-dataplatform-prod
          template: ./TemplateForWorkspace.json
          parameters: ./main-branch/cicd/prod.parameters.json

      - name: Start Prod Triggers
        uses: azure/powershell@v2
        with:
          inlineScript: |
            ./main-branch/cicd/pre-post-deployment.ps1 `
              -ResourceGroupName "rg-dataplatform-prod" `
              -WorkspaceName "naveen-synapse-ws-prod" `
              -Action "start"
          azPSVersion: latest

      - name: Production Deployment Complete
        run: echo "Production deployment successful!"

GitHub Environment Protection Rules

For the Production environment to require manual approval:

  1. GitHub → SettingsEnvironmentsNew environmentProduction
  2. Check Required reviewers
  3. Add the approver(s)
  4. Save

Now the Prod deployment waits for approval before proceeding.

Step 10: CI/CD with Azure DevOps (Complete YAML)

# azure-pipelines.yml
trigger:
  branches:
    include:
      - workspace_publish

pool:
  vmImage: 'ubuntu-latest'

stages:
  - stage: Deploy_UAT
    displayName: 'Deploy to UAT'
    jobs:
      - deployment: DeployUAT
        environment: 'UAT'
        strategy:
          runOnce:
            deploy:
              steps:
                - checkout: self
                  path: publish

                - task: AzureResourceManagerTemplateDeployment@3
                  displayName: 'Deploy ARM to UAT'
                  inputs:
                    azureResourceManagerConnection: 'sp-synapse-cicd'
                    subscriptionId: '$(UAT_SUBSCRIPTION_ID)'
                    resourceGroupName: 'rg-dataplatform-uat'
                    location: 'Canada Central'
                    templateLocation: 'Linked artifact'
                    csmFile: '$(Pipeline.Workspace)/publish/TemplateForWorkspace.json'
                    csmParametersFile: '$(Pipeline.Workspace)/publish/cicd/uat.parameters.json'

  - stage: Deploy_Prod
    displayName: 'Deploy to Production'
    dependsOn: Deploy_UAT
    condition: succeeded()
    jobs:
      - deployment: DeployProd
        environment: 'Production'    # Requires manual approval in Azure DevOps
        strategy:
          runOnce:
            deploy:
              steps:
                - checkout: self
                  path: publish

                - task: AzureResourceManagerTemplateDeployment@3
                  displayName: 'Deploy ARM to Prod'
                  inputs:
                    azureResourceManagerConnection: 'sp-synapse-cicd'
                    subscriptionId: '$(PROD_SUBSCRIPTION_ID)'
                    resourceGroupName: 'rg-dataplatform-prod'
                    location: 'Canada Central'
                    templateLocation: 'Linked artifact'
                    csmFile: '$(Pipeline.Workspace)/publish/TemplateForWorkspace.json'
                    csmParametersFile: '$(Pipeline.Workspace)/publish/cicd/prod.parameters.json'

How Our Pipelines Map to Git JSON Files

What We Built JSON File in Git Content
Metadata-driven pipeline pipeline/PL_MetadataDrivenLoad.json Lookup → ForEach → Copy activities
Synapse audit pipeline pipeline/PL_Parquet_WithAudit.json Copy + Stored Procedure activities
Incremental load pipeline pipeline/PL_IncrementalLoad.json Lookup → Copy → Update watermark
Unified Full+Incremental pipeline/PL_UnifiedLoad.json If Condition + both branches
SCD Type 2 Data Flow dataflow/DF_SCD_Type2.json All Data Flow transformations
All datasets dataset/DS_SourceTable_Dynamic.json Parameterized table/schema
All linked services linkedService/LS_AzureSqlDatabase.json Connection strings (parameterized)
Triggers trigger/TR_Daily_2AM.json Schedule definitions

You never write these files. The Synapse UI creates them as you work. Git tracks every change.

Multi-Subscription Setup (Real Enterprise)

In large companies, Dev, UAT, and Prod often live in different Azure subscriptions:

Subscription: sub-development (rg-dataplatform-dev)
    ├── synapse-ws-dev
    ├── sql-dev
    ├── adls-dev
    └── kv-dev

Subscription: sub-testing (rg-dataplatform-uat)
    ├── synapse-ws-uat
    ├── sql-uat
    ├── adls-uat
    └── kv-uat

Subscription: sub-production (rg-dataplatform-prod)
    ├── synapse-ws-prod
    ├── sql-prod
    ├── adls-prod
    └── kv-prod

The Service Principal needs Contributor role on each target subscription (or resource group). The CI/CD pipeline specifies the subscription ID per environment.

Rollback: What Happens When a Deployment Goes Wrong

Option 1: Redeploy Previous Version

# Find the previous commit on workspace_publish
git log --oneline workspace_publish

# Reset to the previous version
git checkout workspace_publish
git reset --hard HEAD~1
git push --force

# CI/CD triggers and deploys the previous ARM template

Option 2: Revert in Git

# Create a revert commit (safer than force push)
git revert HEAD
git push

Option 3: Revert in Synapse Studio

  1. Switch to main branch in Synapse Studio
  2. Revert the problematic PR on GitHub
  3. Publish again → new ARM templates generated → CI/CD redeploys

Real-life analogy: Rollback is like a restaurant recalling a dish. If the new recipe (deployment) makes customers sick (breaks production), you immediately switch back to the old recipe (previous ARM template). The Git history shows exactly what changed and when.

ADF vs Synapse: CI/CD Differences

Aspect ADF Synapse
Publish branch adf_publish workspace_publish
ARM template file ARMTemplateForFactory.json TemplateForWorkspace.json
Parameters file ARMTemplateParametersForFactory.json TemplateParametersForWorkspace.json
Trigger stop/start Get-AzDataFactoryV2Trigger Get-AzSynapseTrigger
Resource type prefix Microsoft.DataFactory/factories Microsoft.Synapse/workspaces
Git config location Manage → Git configuration Manage → Git configuration
Works with GitHub, Azure DevOps GitHub, Azure DevOps

The CI/CD process is identical — only the file names and PowerShell cmdlets differ.

Common Mistakes

  1. Editing UAT or Prod workspaces directly — those changes get overwritten on next deployment. Only edit Dev.

  2. Publishing from a feature branch — ARM templates should only be generated from the main branch. Publishing from a feature branch deploys untested code.

  3. Forgetting to stop triggers before deployment — pipelines might run with half-deployed configuration. Always stop triggers first.

  4. Hardcoding connection strings instead of parameterizing — if a linked service URL is hardcoded in the template, it deploys with the Dev URL to Prod. Always check the parameters file.

  5. Using the same Key Vault across environments — Dev and Prod should have separate Key Vaults. Otherwise, changing a Dev secret accidentally affects Prod.

  6. Not setting up approval gates for Production — without manual approval, a broken deployment goes straight to Prod. Always require approval.

  7. Committing secrets to the parameters file — use Key Vault references in linked services, not plain-text connection strings. The parameters file should reference the Key Vault URL, not the actual password.

Interview Questions

Q: How does CI/CD work for ADF/Synapse? A: Dev workspace is connected to Git. Engineers work in feature branches and merge via pull requests. Publishing generates ARM templates in a special branch (workspace_publish). A CI/CD pipeline (GitHub Actions or Azure DevOps) deploys those ARM templates to UAT and Prod using environment-specific parameter files. Triggers are stopped before deployment and restarted after.

Q: What is an ARM template in the context of ADF/Synapse? A: A JSON file that describes ALL resources in the workspace — pipelines, datasets, linked services, triggers, data flows. It is auto-generated when you click Publish. The same template deploys to any environment by swapping the parameters file (different connection strings, storage accounts, Key Vaults per environment).

Q: What gets parameterized between environments? A: Linked service connection strings (SQL server URL, database name), storage account URLs, Key Vault URLs, workspace names, and any environment-specific values. The pipeline LOGIC stays identical — only connections change.

Q: Why must you stop triggers before deploying? A: Active triggers might fire during deployment when the workspace is in an inconsistent state — some resources updated, others not yet. This can cause pipeline failures or data corruption. Stop all triggers before deployment and restart them after.

Q: How do you roll back a bad deployment? A: Three options: redeploy the previous ARM template version from Git history, create a revert commit on the workspace_publish branch, or revert the PR in Git and republish from Synapse Studio. Git provides full audit trail of what changed.

Wrapping Up

CI/CD for data pipelines follows the same principles as CI/CD for application code: version control, code review, automated testing, and automated deployment. The only difference is the deployment artifact — ARM templates instead of application binaries.

The workflow is simple once set up: build in Dev → PR to main → Publish → ARM templates generated → CI/CD deploys to UAT → approval → CI/CD deploys to Prod. Every change tracked, every deployment auditable, every rollback possible.

Set it up once. Deploy with confidence forever.

Related posts:CI/CD with GitHub ActionsCI/CD with Azure DevOpsDatabricks Git Integration and CI/CDAzure RBAC RolesMetadata-Driven Pipeline


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link