CI/CD for Azure Data Factory and Synapse Pipelines with GitHub: The Complete Guide

CI/CD for Azure Data Factory and Synapse Pipelines with GitHub: The Complete Guide

In a real enterprise project, nobody builds data pipelines by clicking through the ADF UI in production. Every pipeline, dataset, and linked service is stored in Git, reviewed through pull requests, and deployed to UAT and production through an automated CI/CD pipeline.

This is the professional workflow that separates learning projects from production-grade data platforms. And it’s one of the most asked topics in senior data engineering interviews.

In this guide, I’ll walk you through the complete CI/CD setup for Azure Data Factory and Synapse using GitHub — from connecting your workspace to Git, to working with feature branches, to deploying across environments with GitHub Actions.

Table of Contents

  • Why CI/CD for Data Pipelines?
  • The Big Picture: Dev to Production Flow
  • What Gets Stored in Git
  • Step 1: Create a GitHub Repository
  • Step 2: Connect ADF/Synapse to GitHub
  • Step 3: Working with Branches
  • Step 4: The Publish Process and ARM Templates
  • Step 5: Understanding Parameterization Across Environments
  • Step 6: Create a Service Principal for Deployment
  • Step 7: Set Up GitHub Actions for CI/CD
  • Step 8: Pre/Post Deployment Scripts
  • Step 9: Manual Approval Gates
  • The Complete GitHub Actions Workflow
  • Real-World Rules and Best Practices
  • ADF Git Integration vs Synapse Git Integration
  • Troubleshooting Common CI/CD Issues
  • Interview Questions
  • Wrapping Up

Why CI/CD for Data Pipelines?

Without CI/CD, your production pipeline workflow looks like this:

Developer builds pipeline in Dev ADF UI
  --> Developer logs into Prod ADF UI
    --> Developer manually recreates everything
      --> Developer hopes nothing was missed
        --> Production breaks at 3 AM

With CI/CD:

Developer builds pipeline in Dev ADF UI (connected to Git)
  --> Creates Pull Request on GitHub
    --> Teammate reviews and approves
      --> Developer clicks Publish (generates ARM templates)
        --> GitHub Actions automatically deploys to UAT
          --> Manual approval --> deploys to Production
            --> Everything is version-controlled, auditable, repeatable

The key benefits:

  • Version control — every change is tracked, reviewable, and reversible
  • Code review — teammates catch issues before they reach production
  • Automated deployment — no manual steps, no human error
  • Environment consistency — Dev, UAT, and Prod have identical pipelines
  • Audit trail — who changed what, when, and why
  • Rollback — production breaks? Revert to the previous commit

The Big Picture: Dev to Production Flow

In a real project, you have multiple environments:

Development (Dev)  -->  UAT/Test  -->  Production (Prod)

Each environment has its own ADF or Synapse workspace. Only the Dev workspace is connected to GitHub. UAT and Prod are NEVER edited manually — they receive deployments only through CI/CD.

Dev ADF/Synapse (connected to GitHub)
  |
  |-- Developer works in feature branch
  |-- Creates PR --> code review --> merge to main
  |-- Clicks Publish --> generates ARM templates in workspace_publish branch
  |
  v
GitHub Actions CI/CD
  |
  |-- Picks up ARM templates from workspace_publish
  |-- Deploys to UAT (with UAT parameters)
  |-- Manual approval gate
  |-- Deploys to Production (with Prod parameters)

What Gets Stored in Git

When you connect ADF/Synapse to GitHub, every resource you create in the UI gets auto-saved as a JSON file:

your-repo/
-- pipeline/
   -- PL_Copy_SqlToADLS.json
   -- PL_Copy_SqlToADLS_Parquet_WithAudit.json
   -- PL_IncrementalLoad.json
-- dataset/
   -- DS_SqlDB_Metadata.json
   -- DS_SqlDB_SourceTable.json
   -- DS_ADLS_Sink.json
   -- DS_ADLS_Sink_Parquet.json
-- linkedService/
   -- LS_AzureSqlDB.json
   -- naveen-synapse-ws-WorkspaceDefaultStorage.json
-- trigger/
   -- TR_Daily_2AM.json
-- integrationRuntime/
   -- IR_SelfHosted.json (if applicable)
-- publish_config.json

All the pipelines we’ve built throughout this blog series — full load, Parquet with audit logging, incremental load — would all be JSON files in this repo. You never write these JSON files by hand. The ADF/Synapse UI generates them as you work.

Step 1: Create a GitHub Repository

  1. Go to github.com and create a new repository
  2. Name it: adf-pipelines (for ADF) or synapse-pipelines (for Synapse)
  3. Set it to Private (your pipeline configs contain resource names)
  4. Initialize with a README
  5. Create the following branches:
  6. main — the collaboration branch (where approved code lives)
  7. workspace_publish — auto-generated by ADF/Synapse when you click Publish

Step 2: Connect ADF/Synapse to GitHub

For Azure Data Factory

  1. Open ADF Studio (adf.azure.com)
  2. Click the Manage tab (wrench icon)
  3. Click Git configuration under Source control
  4. Click Configure
  5. Select GitHub as the repository type
  6. Click Authorize and sign into your GitHub account
  7. Configure:
  8. Repository owner: your GitHub username
  9. Repository name: adf-pipelines
  10. Collaboration branch: main
  11. Publish branch: adf_publish (auto-created)
  12. Root folder: / (default)
  13. Import existing resources: Yes (imports your current pipelines into Git)
  14. Click Apply

For Azure Synapse

  1. Open Synapse Studio
  2. Click the Manage tab
  3. Click Git configuration
  4. Select GitHub
  5. Authorize and configure (same settings as ADF but publish branch is workspace_publish)
  6. Click Apply

After connecting, you’ll notice: – A branch dropdown appears in the top toolbar – Your current branch shows (e.g., main) – All existing pipelines, datasets, and linked services are committed to the repo

Step 3: Working with Branches

The Branch Workflow

Never work directly on main. Always create a feature branch:

  1. In ADF/Synapse Studio, click the branch dropdown in the toolbar
  2. Click + New branch
  3. Name it descriptively: feature/incremental-load or fix/customer-pipeline
  4. Make your changes (create/edit pipelines, datasets, etc.)
  5. Changes are auto-saved to your feature branch in GitHub

Creating a Pull Request

When your changes are ready:

  1. Click the branch dropdown –> click Create pull request
  2. This opens GitHub in your browser with a new PR
  3. Add a title and description:
  4. Title: “Add incremental load pipeline for EMPLOYEE and ORDERS tables”
  5. Description: what changed, why, and how to test
  6. Assign a reviewer (or review it yourself for personal projects)
  7. Reviewer checks the JSON changes in the PR
  8. After approval, merge the PR into main

What Reviewers Look For in a PR

  • Are dataset parameters properly configured?
  • Are linked service references correct?
  • Are expression names matching activity names exactly?
  • Are there any hardcoded values that should be parameterized?
  • Does the pipeline follow naming conventions?

Step 4: The Publish Process and ARM Templates

After merging a PR to main, you need to Publish in ADF/Synapse Studio:

  1. Switch to the main branch in the toolbar
  2. Click the Publish button
  3. ADF/Synapse validates all resources
  4. If validation passes, it generates ARM templates and pushes them to a special branch:
  5. ADF: adf_publish branch
  6. Synapse: workspace_publish branch

What ARM Templates Contain

The publish branch has two critical files:

workspace_publish/  (or adf_publish/)
-- TemplateForWorkspace.json           (all resources defined)
-- TemplateParametersForWorkspace.json (parameterized values)

TemplateForWorkspace.json — Contains the complete definition of every pipeline, dataset, linked service, trigger, and integration runtime. This is what gets deployed to other environments.

TemplateParametersForWorkspace.json — Contains the parameterized values that change between environments (SQL server URLs, storage account names, Key Vault URLs).

What Gets Parameterized Automatically

ADF/Synapse automatically parameterizes these values in the ARM template:

Resource Type What Gets Parameterized
Linked Services Connection strings, server names, database names, Key Vault URLs
Integration Runtimes IR names and references
Triggers Schedule times (sometimes)
Pipeline parameters Default values

Step 5: Understanding Parameterization Across Environments

Your Dev and Prod environments have different infrastructure:

Parameter Dev UAT Prod
SQL Server URL dev-sql.database.windows.net uat-sql.database.windows.net prod-sql.database.windows.net
SQL Database AdventureWorksLT-dev AdventureWorksLT-uat AdventureWorksLT-prod
ADLS Account devdatalake uatdatalake proddatalake
Key Vault URL https://dev-kv.vault.azure.net https://uat-kv.vault.azure.net https://prod-kv.vault.azure.net

The ARM template keeps the pipeline logic identical across all environments. Only the connection parameters change. You create environment-specific parameter files:

uat.parameters.json:

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "LS_AzureSqlDB_connectionString": {
            "value": "integrated security=False;data source=uat-sql.database.windows.net;initial catalog=AdventureWorksLT-uat"
        },
        "LS_ADLS_Gen2_url": {
            "value": "https://uatdatalake.dfs.core.windows.net"
        }
    }
}

prod.parameters.json:

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "LS_AzureSqlDB_connectionString": {
            "value": "integrated security=False;data source=prod-sql.database.windows.net;initial catalog=AdventureWorksLT-prod"
        },
        "LS_ADLS_Gen2_url": {
            "value": "https://proddatalake.dfs.core.windows.net"
        }
    }
}

Store these parameter files in your repo (but NEVER put passwords in them — use Key Vault references).

Step 6: Create a Service Principal for Deployment

GitHub Actions needs an Azure identity to deploy ARM templates. Create a Service Principal:

Using Azure CLI

# Login to Azure
az login

# Create the Service Principal with Contributor access
az ad sp create-for-rbac --name "sp-github-adf-deploy"     --role Contributor     --scopes /subscriptions/YOUR_SUBSCRIPTION_ID     --sdk-auth

This outputs a JSON object:

{
    "clientId": "xxxx-xxxx-xxxx",
    "clientSecret": "xxxx",
    "subscriptionId": "xxxx",
    "tenantId": "xxxx",
    ...
}

Store in GitHub Secrets

  1. Go to your GitHub repo –> Settings –> Secrets and variables –> Actions
  2. Click New repository secret
  3. Name: AZURE_CREDENTIALS
  4. Value: paste the entire JSON output from the az command
  5. Click Add secret

For additional security, also add: – AZURE_SUBSCRIPTION_ID — your subscription ID – AZURE_RESOURCE_GROUP_UAT — UAT resource group name – AZURE_RESOURCE_GROUP_PROD — Prod resource group name

Step 7: Set Up GitHub Actions for CI/CD

Create a workflow file in your repo:

.github/workflows/adf-deploy.yml

Basic Workflow (Deploy to UAT only)

name: ADF/Synapse CI/CD

on:
  push:
    branches:
      - adf_publish    # For ADF
      # - workspace_publish  # For Synapse

jobs:
  deploy-to-uat:
    runs-on: ubuntu-latest
    environment: UAT
    steps:
      - name: Checkout publish branch
        uses: actions/checkout@v4
        with:
          ref: adf_publish

      - name: Login to Azure
        uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Deploy ARM template to UAT
        uses: azure/arm-deploy@v2
        with:
          resourceGroupName: ${{ secrets.AZURE_RESOURCE_GROUP_UAT }}
          template: ./TemplateForWorkspace.json
          parameters: ./uat.parameters.json

What This Does

  1. Triggers when ARM templates are pushed to the adf_publish branch (which happens when you click Publish in ADF Studio)
  2. Checks out the publish branch to get the ARM templates
  3. Logs into Azure using the Service Principal credentials stored in GitHub Secrets
  4. Deploys the ARM template to the UAT resource group using the UAT parameter file

Step 8: Pre/Post Deployment Scripts

Before deploying to a target workspace, you need to stop active triggers to prevent pipelines from running during deployment. After deployment, you restart them.

Pre-deployment Script (stop-triggers.ps1)

# Stop all triggers in the target workspace before deployment
az datafactory trigger list     --resource-group $RESOURCE_GROUP     --factory-name $ADF_NAME     --query "[?properties.runtimeState=='Started'].name"     --output tsv | while read trigger; do
        echo "Stopping trigger: $trigger"
        az datafactory trigger stop             --resource-group $RESOURCE_GROUP             --factory-name $ADF_NAME             --name "$trigger"
    done

Post-deployment Script (start-triggers.ps1)

# Restart all triggers after deployment
az datafactory trigger list     --resource-group $RESOURCE_GROUP     --factory-name $ADF_NAME     --query "[?properties.runtimeState=='Stopped'].name"     --output tsv | while read trigger; do
        echo "Starting trigger: $trigger"
        az datafactory trigger start             --resource-group $RESOURCE_GROUP             --factory-name $ADF_NAME             --name "$trigger"
    done

Why this matters: If a scheduled trigger fires during deployment, it might try to run a half-deployed pipeline — causing failures and potentially corrupting data.

Step 9: Manual Approval Gates

For production deployments, add a manual approval step so someone reviews the UAT deployment before promoting to Prod:

  1. In GitHub, go to Settings –> Environments
  2. Create an environment called Production
  3. Enable Required reviewers –> add yourself or your team lead
  4. Now the workflow pauses at the production deployment step and waits for approval

The Complete GitHub Actions Workflow

Here’s the full production-ready workflow:

name: ADF/Synapse CI/CD Pipeline

on:
  push:
    branches:
      - adf_publish

env:
  ADF_NAME_UAT: adf-uat-workspace
  ADF_NAME_PROD: adf-prod-workspace

jobs:
  # ---- Deploy to UAT ----
  deploy-uat:
    runs-on: ubuntu-latest
    environment: UAT
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          ref: adf_publish

      - name: Azure Login
        uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Stop UAT Triggers
        run: |
          triggers=$(az datafactory trigger list             --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_UAT }}             --factory-name ${{ env.ADF_NAME_UAT }}             --query "[?properties.runtimeState=='Started'].name" -o tsv)
          for trigger in $triggers; do
            echo "Stopping: $trigger"
            az datafactory trigger stop               --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_UAT }}               --factory-name ${{ env.ADF_NAME_UAT }}               --name "$trigger"
          done

      - name: Deploy to UAT
        uses: azure/arm-deploy@v2
        with:
          resourceGroupName: ${{ secrets.AZURE_RESOURCE_GROUP_UAT }}
          template: ./TemplateForWorkspace.json
          parameters: ./uat.parameters.json

      - name: Start UAT Triggers
        run: |
          triggers=$(az datafactory trigger list             --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_UAT }}             --factory-name ${{ env.ADF_NAME_UAT }}             --query "[?properties.runtimeState=='Stopped'].name" -o tsv)
          for trigger in $triggers; do
            echo "Starting: $trigger"
            az datafactory trigger start               --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_UAT }}               --factory-name ${{ env.ADF_NAME_UAT }}               --name "$trigger"
          done

  # ---- Deploy to Production (requires approval) ----
  deploy-prod:
    runs-on: ubuntu-latest
    needs: deploy-uat
    environment: Production    # Requires manual approval
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          ref: adf_publish

      - name: Azure Login
        uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Stop Prod Triggers
        run: |
          triggers=$(az datafactory trigger list             --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_PROD }}             --factory-name ${{ env.ADF_NAME_PROD }}             --query "[?properties.runtimeState=='Started'].name" -o tsv)
          for trigger in $triggers; do
            az datafactory trigger stop               --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_PROD }}               --factory-name ${{ env.ADF_NAME_PROD }}               --name "$trigger"
          done

      - name: Deploy to Production
        uses: azure/arm-deploy@v2
        with:
          resourceGroupName: ${{ secrets.AZURE_RESOURCE_GROUP_PROD }}
          template: ./TemplateForWorkspace.json
          parameters: ./prod.parameters.json

      - name: Start Prod Triggers
        run: |
          triggers=$(az datafactory trigger list             --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_PROD }}             --factory-name ${{ env.ADF_NAME_PROD }}             --query "[?properties.runtimeState=='Stopped'].name" -o tsv)
          for trigger in $triggers; do
            az datafactory trigger start               --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_PROD }}               --factory-name ${{ env.ADF_NAME_PROD }}               --name "$trigger"
          done

Real-World Rules and Best Practices

Rule 1: Only Dev Is Connected to Git

UAT and Prod workspaces are in “Live mode” and are never edited manually. All changes flow through Dev –> PR –> Publish –> CI/CD.

Rule 2: Never Edit Production Directly

If someone edits Prod directly, their changes will be overwritten on the next CI/CD deployment. This is by design — Git is the source of truth.

Rule 3: Use Azure Key Vault for Secrets

Never hardcode passwords or connection strings. Store them in Key Vault and reference them in linked services:

{
    "type": "AzureKeyVaultSecret",
    "store": {
        "referenceName": "LS_KeyVault",
        "type": "LinkedServiceReference"
    },
    "secretName": "sql-connection-string"
}

Each environment has its own Key Vault (dev-kv, uat-kv, prod-kv).

Rule 4: Use Managed Identity in Production

Dev might use SQL authentication for convenience. Production should use Managed Identity — no credentials to manage, rotate, or leak.

Rule 5: Pre/Post Deployment Scripts Are Mandatory

Always stop triggers before deployment and restart after. A trigger firing during deployment can cause data corruption.

Rule 6: Naming Conventions Matter

Use underscores or hyphens, never spaces:

GOOD: PL_Copy_SqlToADLS, DS_SqlDB_SourceTable, LS_AzureSqlDB
BAD:  PL Copy Sql To ADLS, DS Sql DB Source Table

ARM templates can break on spaces in resource names.

Rule 7: Test the Deployment Before Prod

Always deploy to UAT first. Run the pipeline manually in UAT. Verify the output. Only then approve the production deployment.

Rule 8: Use Branch Policies

On GitHub, protect the main branch: – Require pull requests (no direct pushes) – Require at least 1 review approval – Require status checks to pass

This ensures no untested code reaches production.

ADF Git Integration vs Synapse Git Integration

Aspect ADF Synapse
Publish branch adf_publish workspace_publish
Git config location Manage > Git configuration Manage > Git configuration
ARM template files ARMTemplateForFactory.json TemplateForWorkspace.json
Parameters file ARMTemplateParametersForFactory.json TemplateParametersForWorkspace.json
Resources stored Pipelines, datasets, linked services, triggers, IRs Same + notebooks, SQL scripts, Spark job definitions
Live mode toggle Available Available

The main difference is file naming. The workflow logic is identical.

Troubleshooting Common CI/CD Issues

“Deployment failed: Resource not found”

The target workspace doesn’t exist or the Service Principal doesn’t have access. Verify the resource group name and that the SP has Contributor role.

“Trigger cannot be started”

The trigger might reference a pipeline that failed to deploy. Check the deployment logs for earlier errors.

“ARM template validation failed”

Usually caused by linked service references that don’t exist in the target environment. Make sure all linked services referenced by pipelines are included in the ARM template.

“Merge conflicts in JSON files”

Two developers modified the same pipeline in different branches. Resolve conflicts in GitHub (or locally) before merging. JSON merge conflicts can be tricky — review carefully.

“Publish button is grayed out”

You’re not on the collaboration branch (main). Switch to main first, then click Publish. You can only publish from the collaboration branch.

Interview Questions

Q: How do you deploy ADF pipelines to production? A: Connect the Dev ADF workspace to GitHub. Developers work in feature branches, create PRs for code review, merge to main, and click Publish to generate ARM templates. A GitHub Actions (or Azure DevOps) CI/CD pipeline picks up the ARM templates and deploys to UAT first, then Production after manual approval.

Q: What is the publish branch in ADF? A: When you click Publish in ADF Studio, it generates ARM templates and pushes them to a special branch (adf_publish for ADF, workspace_publish for Synapse). These ARM templates contain the complete definition of all resources and are the deployment artifacts used by CI/CD.

Q: How do you handle different configurations across environments? A: ADF automatically parameterizes linked service connection strings in the ARM template. You create environment-specific parameter files (uat.parameters.json, prod.parameters.json) that override these values during deployment. Secrets are stored in Azure Key Vault.

Q: Can multiple developers work on the same ADF at the same time? A: Yes. Each developer works in their own feature branch. Changes are merged via pull requests. If two developers modify the same pipeline, they resolve merge conflicts in Git before merging.

Q: What happens if someone edits the production ADF directly? A: Their changes will be overwritten on the next CI/CD deployment. Production should never be edited directly — only deployed to through the CI/CD pipeline.

Q: Why do you stop triggers before deployment? A: To prevent pipelines from running during deployment. A trigger that fires while resources are being updated could run a half-deployed pipeline, causing failures or data corruption.

Wrapping Up

CI/CD for data pipelines follows the same principles as CI/CD for application code — version control, code review, automated testing, and automated deployment. The tools are different (ARM templates instead of Docker images, GitHub Actions instead of Jenkins), but the workflow is the same.

Setting up CI/CD takes a few hours upfront, but it pays for itself immediately: – No more “it works in dev but breaks in prod” – No more “who changed the pipeline last night?” – No more manual deployments that miss a dataset or linked service

This is how every serious data platform operates in 2026. Master it, and you’re operating at a senior data engineer level.

Related posts:What is Azure Data Factory?ADF vs Synapse ComparisonMetadata-Driven Pipeline in ADFSynapse Pipeline with Audit LoggingTop 15 ADF Interview Questions

If this guide helped you understand CI/CD for data pipelines, share it with your team. Have questions? Drop a comment below.


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link