CI/CD for Azure Data Factory and Synapse Pipelines with GitHub: The Complete Guide
In a real enterprise project, nobody builds data pipelines by clicking through the ADF UI in production. Every pipeline, dataset, and linked service is stored in Git, reviewed through pull requests, and deployed to UAT and production through an automated CI/CD pipeline.
This is the professional workflow that separates learning projects from production-grade data platforms. And it’s one of the most asked topics in senior data engineering interviews.
In this guide, I’ll walk you through the complete CI/CD setup for Azure Data Factory and Synapse using GitHub — from connecting your workspace to Git, to working with feature branches, to deploying across environments with GitHub Actions.
Table of Contents
- Why CI/CD for Data Pipelines?
- The Big Picture: Dev to Production Flow
- What Gets Stored in Git
- Step 1: Create a GitHub Repository
- Step 2: Connect ADF/Synapse to GitHub
- Step 3: Working with Branches
- Step 4: The Publish Process and ARM Templates
- Step 5: Understanding Parameterization Across Environments
- Step 6: Create a Service Principal for Deployment
- Step 7: Set Up GitHub Actions for CI/CD
- Step 8: Pre/Post Deployment Scripts
- Step 9: Manual Approval Gates
- The Complete GitHub Actions Workflow
- Real-World Rules and Best Practices
- ADF Git Integration vs Synapse Git Integration
- Troubleshooting Common CI/CD Issues
- Interview Questions
- Wrapping Up
Why CI/CD for Data Pipelines?
Without CI/CD, your production pipeline workflow looks like this:
Developer builds pipeline in Dev ADF UI
--> Developer logs into Prod ADF UI
--> Developer manually recreates everything
--> Developer hopes nothing was missed
--> Production breaks at 3 AM
With CI/CD:
Developer builds pipeline in Dev ADF UI (connected to Git)
--> Creates Pull Request on GitHub
--> Teammate reviews and approves
--> Developer clicks Publish (generates ARM templates)
--> GitHub Actions automatically deploys to UAT
--> Manual approval --> deploys to Production
--> Everything is version-controlled, auditable, repeatable
The key benefits:
- Version control — every change is tracked, reviewable, and reversible
- Code review — teammates catch issues before they reach production
- Automated deployment — no manual steps, no human error
- Environment consistency — Dev, UAT, and Prod have identical pipelines
- Audit trail — who changed what, when, and why
- Rollback — production breaks? Revert to the previous commit
The Big Picture: Dev to Production Flow
In a real project, you have multiple environments:
Development (Dev) --> UAT/Test --> Production (Prod)
Each environment has its own ADF or Synapse workspace. Only the Dev workspace is connected to GitHub. UAT and Prod are NEVER edited manually — they receive deployments only through CI/CD.
Dev ADF/Synapse (connected to GitHub)
|
|-- Developer works in feature branch
|-- Creates PR --> code review --> merge to main
|-- Clicks Publish --> generates ARM templates in workspace_publish branch
|
v
GitHub Actions CI/CD
|
|-- Picks up ARM templates from workspace_publish
|-- Deploys to UAT (with UAT parameters)
|-- Manual approval gate
|-- Deploys to Production (with Prod parameters)
What Gets Stored in Git
When you connect ADF/Synapse to GitHub, every resource you create in the UI gets auto-saved as a JSON file:
your-repo/
-- pipeline/
-- PL_Copy_SqlToADLS.json
-- PL_Copy_SqlToADLS_Parquet_WithAudit.json
-- PL_IncrementalLoad.json
-- dataset/
-- DS_SqlDB_Metadata.json
-- DS_SqlDB_SourceTable.json
-- DS_ADLS_Sink.json
-- DS_ADLS_Sink_Parquet.json
-- linkedService/
-- LS_AzureSqlDB.json
-- naveen-synapse-ws-WorkspaceDefaultStorage.json
-- trigger/
-- TR_Daily_2AM.json
-- integrationRuntime/
-- IR_SelfHosted.json (if applicable)
-- publish_config.json
All the pipelines we’ve built throughout this blog series — full load, Parquet with audit logging, incremental load — would all be JSON files in this repo. You never write these JSON files by hand. The ADF/Synapse UI generates them as you work.
Step 1: Create a GitHub Repository
- Go to github.com and create a new repository
- Name it:
adf-pipelines(for ADF) orsynapse-pipelines(for Synapse) - Set it to Private (your pipeline configs contain resource names)
- Initialize with a README
- Create the following branches:
main— the collaboration branch (where approved code lives)workspace_publish— auto-generated by ADF/Synapse when you click Publish
Step 2: Connect ADF/Synapse to GitHub
For Azure Data Factory
- Open ADF Studio (adf.azure.com)
- Click the Manage tab (wrench icon)
- Click Git configuration under Source control
- Click Configure
- Select GitHub as the repository type
- Click Authorize and sign into your GitHub account
- Configure:
- Repository owner: your GitHub username
- Repository name:
adf-pipelines - Collaboration branch:
main - Publish branch:
adf_publish(auto-created) - Root folder:
/(default) - Import existing resources: Yes (imports your current pipelines into Git)
- Click Apply
For Azure Synapse
- Open Synapse Studio
- Click the Manage tab
- Click Git configuration
- Select GitHub
- Authorize and configure (same settings as ADF but publish branch is
workspace_publish) - Click Apply
After connecting, you’ll notice:
– A branch dropdown appears in the top toolbar
– Your current branch shows (e.g., main)
– All existing pipelines, datasets, and linked services are committed to the repo
Step 3: Working with Branches
The Branch Workflow
Never work directly on main. Always create a feature branch:
- In ADF/Synapse Studio, click the branch dropdown in the toolbar
- Click + New branch
- Name it descriptively:
feature/incremental-loadorfix/customer-pipeline - Make your changes (create/edit pipelines, datasets, etc.)
- Changes are auto-saved to your feature branch in GitHub
Creating a Pull Request
When your changes are ready:
- Click the branch dropdown –> click Create pull request
- This opens GitHub in your browser with a new PR
- Add a title and description:
- Title: “Add incremental load pipeline for EMPLOYEE and ORDERS tables”
- Description: what changed, why, and how to test
- Assign a reviewer (or review it yourself for personal projects)
- Reviewer checks the JSON changes in the PR
- After approval, merge the PR into
main
What Reviewers Look For in a PR
- Are dataset parameters properly configured?
- Are linked service references correct?
- Are expression names matching activity names exactly?
- Are there any hardcoded values that should be parameterized?
- Does the pipeline follow naming conventions?
Step 4: The Publish Process and ARM Templates
After merging a PR to main, you need to Publish in ADF/Synapse Studio:
- Switch to the
mainbranch in the toolbar - Click the Publish button
- ADF/Synapse validates all resources
- If validation passes, it generates ARM templates and pushes them to a special branch:
- ADF:
adf_publishbranch - Synapse:
workspace_publishbranch
What ARM Templates Contain
The publish branch has two critical files:
workspace_publish/ (or adf_publish/)
-- TemplateForWorkspace.json (all resources defined)
-- TemplateParametersForWorkspace.json (parameterized values)
TemplateForWorkspace.json — Contains the complete definition of every pipeline, dataset, linked service, trigger, and integration runtime. This is what gets deployed to other environments.
TemplateParametersForWorkspace.json — Contains the parameterized values that change between environments (SQL server URLs, storage account names, Key Vault URLs).
What Gets Parameterized Automatically
ADF/Synapse automatically parameterizes these values in the ARM template:
| Resource Type | What Gets Parameterized |
|---|---|
| Linked Services | Connection strings, server names, database names, Key Vault URLs |
| Integration Runtimes | IR names and references |
| Triggers | Schedule times (sometimes) |
| Pipeline parameters | Default values |
Step 5: Understanding Parameterization Across Environments
Your Dev and Prod environments have different infrastructure:
| Parameter | Dev | UAT | Prod |
|---|---|---|---|
| SQL Server URL | dev-sql.database.windows.net |
uat-sql.database.windows.net |
prod-sql.database.windows.net |
| SQL Database | AdventureWorksLT-dev |
AdventureWorksLT-uat |
AdventureWorksLT-prod |
| ADLS Account | devdatalake |
uatdatalake |
proddatalake |
| Key Vault URL | https://dev-kv.vault.azure.net |
https://uat-kv.vault.azure.net |
https://prod-kv.vault.azure.net |
The ARM template keeps the pipeline logic identical across all environments. Only the connection parameters change. You create environment-specific parameter files:
uat.parameters.json:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"LS_AzureSqlDB_connectionString": {
"value": "integrated security=False;data source=uat-sql.database.windows.net;initial catalog=AdventureWorksLT-uat"
},
"LS_ADLS_Gen2_url": {
"value": "https://uatdatalake.dfs.core.windows.net"
}
}
}
prod.parameters.json:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"LS_AzureSqlDB_connectionString": {
"value": "integrated security=False;data source=prod-sql.database.windows.net;initial catalog=AdventureWorksLT-prod"
},
"LS_ADLS_Gen2_url": {
"value": "https://proddatalake.dfs.core.windows.net"
}
}
}
Store these parameter files in your repo (but NEVER put passwords in them — use Key Vault references).
Step 6: Create a Service Principal for Deployment
GitHub Actions needs an Azure identity to deploy ARM templates. Create a Service Principal:
Using Azure CLI
# Login to Azure
az login
# Create the Service Principal with Contributor access
az ad sp create-for-rbac --name "sp-github-adf-deploy" --role Contributor --scopes /subscriptions/YOUR_SUBSCRIPTION_ID --sdk-auth
This outputs a JSON object:
{
"clientId": "xxxx-xxxx-xxxx",
"clientSecret": "xxxx",
"subscriptionId": "xxxx",
"tenantId": "xxxx",
...
}
Store in GitHub Secrets
- Go to your GitHub repo –> Settings –> Secrets and variables –> Actions
- Click New repository secret
- Name:
AZURE_CREDENTIALS - Value: paste the entire JSON output from the az command
- Click Add secret
For additional security, also add:
– AZURE_SUBSCRIPTION_ID — your subscription ID
– AZURE_RESOURCE_GROUP_UAT — UAT resource group name
– AZURE_RESOURCE_GROUP_PROD — Prod resource group name
Step 7: Set Up GitHub Actions for CI/CD
Create a workflow file in your repo:
.github/workflows/adf-deploy.yml
Basic Workflow (Deploy to UAT only)
name: ADF/Synapse CI/CD
on:
push:
branches:
- adf_publish # For ADF
# - workspace_publish # For Synapse
jobs:
deploy-to-uat:
runs-on: ubuntu-latest
environment: UAT
steps:
- name: Checkout publish branch
uses: actions/checkout@v4
with:
ref: adf_publish
- name: Login to Azure
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Deploy ARM template to UAT
uses: azure/arm-deploy@v2
with:
resourceGroupName: ${{ secrets.AZURE_RESOURCE_GROUP_UAT }}
template: ./TemplateForWorkspace.json
parameters: ./uat.parameters.json
What This Does
- Triggers when ARM templates are pushed to the
adf_publishbranch (which happens when you click Publish in ADF Studio) - Checks out the publish branch to get the ARM templates
- Logs into Azure using the Service Principal credentials stored in GitHub Secrets
- Deploys the ARM template to the UAT resource group using the UAT parameter file
Step 8: Pre/Post Deployment Scripts
Before deploying to a target workspace, you need to stop active triggers to prevent pipelines from running during deployment. After deployment, you restart them.
Pre-deployment Script (stop-triggers.ps1)
# Stop all triggers in the target workspace before deployment
az datafactory trigger list --resource-group $RESOURCE_GROUP --factory-name $ADF_NAME --query "[?properties.runtimeState=='Started'].name" --output tsv | while read trigger; do
echo "Stopping trigger: $trigger"
az datafactory trigger stop --resource-group $RESOURCE_GROUP --factory-name $ADF_NAME --name "$trigger"
done
Post-deployment Script (start-triggers.ps1)
# Restart all triggers after deployment
az datafactory trigger list --resource-group $RESOURCE_GROUP --factory-name $ADF_NAME --query "[?properties.runtimeState=='Stopped'].name" --output tsv | while read trigger; do
echo "Starting trigger: $trigger"
az datafactory trigger start --resource-group $RESOURCE_GROUP --factory-name $ADF_NAME --name "$trigger"
done
Why this matters: If a scheduled trigger fires during deployment, it might try to run a half-deployed pipeline — causing failures and potentially corrupting data.
Step 9: Manual Approval Gates
For production deployments, add a manual approval step so someone reviews the UAT deployment before promoting to Prod:
- In GitHub, go to Settings –> Environments
- Create an environment called
Production - Enable Required reviewers –> add yourself or your team lead
- Now the workflow pauses at the production deployment step and waits for approval
The Complete GitHub Actions Workflow
Here’s the full production-ready workflow:
name: ADF/Synapse CI/CD Pipeline
on:
push:
branches:
- adf_publish
env:
ADF_NAME_UAT: adf-uat-workspace
ADF_NAME_PROD: adf-prod-workspace
jobs:
# ---- Deploy to UAT ----
deploy-uat:
runs-on: ubuntu-latest
environment: UAT
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: adf_publish
- name: Azure Login
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Stop UAT Triggers
run: |
triggers=$(az datafactory trigger list --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_UAT }} --factory-name ${{ env.ADF_NAME_UAT }} --query "[?properties.runtimeState=='Started'].name" -o tsv)
for trigger in $triggers; do
echo "Stopping: $trigger"
az datafactory trigger stop --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_UAT }} --factory-name ${{ env.ADF_NAME_UAT }} --name "$trigger"
done
- name: Deploy to UAT
uses: azure/arm-deploy@v2
with:
resourceGroupName: ${{ secrets.AZURE_RESOURCE_GROUP_UAT }}
template: ./TemplateForWorkspace.json
parameters: ./uat.parameters.json
- name: Start UAT Triggers
run: |
triggers=$(az datafactory trigger list --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_UAT }} --factory-name ${{ env.ADF_NAME_UAT }} --query "[?properties.runtimeState=='Stopped'].name" -o tsv)
for trigger in $triggers; do
echo "Starting: $trigger"
az datafactory trigger start --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_UAT }} --factory-name ${{ env.ADF_NAME_UAT }} --name "$trigger"
done
# ---- Deploy to Production (requires approval) ----
deploy-prod:
runs-on: ubuntu-latest
needs: deploy-uat
environment: Production # Requires manual approval
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: adf_publish
- name: Azure Login
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Stop Prod Triggers
run: |
triggers=$(az datafactory trigger list --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_PROD }} --factory-name ${{ env.ADF_NAME_PROD }} --query "[?properties.runtimeState=='Started'].name" -o tsv)
for trigger in $triggers; do
az datafactory trigger stop --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_PROD }} --factory-name ${{ env.ADF_NAME_PROD }} --name "$trigger"
done
- name: Deploy to Production
uses: azure/arm-deploy@v2
with:
resourceGroupName: ${{ secrets.AZURE_RESOURCE_GROUP_PROD }}
template: ./TemplateForWorkspace.json
parameters: ./prod.parameters.json
- name: Start Prod Triggers
run: |
triggers=$(az datafactory trigger list --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_PROD }} --factory-name ${{ env.ADF_NAME_PROD }} --query "[?properties.runtimeState=='Stopped'].name" -o tsv)
for trigger in $triggers; do
az datafactory trigger start --resource-group ${{ secrets.AZURE_RESOURCE_GROUP_PROD }} --factory-name ${{ env.ADF_NAME_PROD }} --name "$trigger"
done
Real-World Rules and Best Practices
Rule 1: Only Dev Is Connected to Git
UAT and Prod workspaces are in “Live mode” and are never edited manually. All changes flow through Dev –> PR –> Publish –> CI/CD.
Rule 2: Never Edit Production Directly
If someone edits Prod directly, their changes will be overwritten on the next CI/CD deployment. This is by design — Git is the source of truth.
Rule 3: Use Azure Key Vault for Secrets
Never hardcode passwords or connection strings. Store them in Key Vault and reference them in linked services:
{
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "LS_KeyVault",
"type": "LinkedServiceReference"
},
"secretName": "sql-connection-string"
}
Each environment has its own Key Vault (dev-kv, uat-kv, prod-kv).
Rule 4: Use Managed Identity in Production
Dev might use SQL authentication for convenience. Production should use Managed Identity — no credentials to manage, rotate, or leak.
Rule 5: Pre/Post Deployment Scripts Are Mandatory
Always stop triggers before deployment and restart after. A trigger firing during deployment can cause data corruption.
Rule 6: Naming Conventions Matter
Use underscores or hyphens, never spaces:
GOOD: PL_Copy_SqlToADLS, DS_SqlDB_SourceTable, LS_AzureSqlDB
BAD: PL Copy Sql To ADLS, DS Sql DB Source Table
ARM templates can break on spaces in resource names.
Rule 7: Test the Deployment Before Prod
Always deploy to UAT first. Run the pipeline manually in UAT. Verify the output. Only then approve the production deployment.
Rule 8: Use Branch Policies
On GitHub, protect the main branch:
– Require pull requests (no direct pushes)
– Require at least 1 review approval
– Require status checks to pass
This ensures no untested code reaches production.
ADF Git Integration vs Synapse Git Integration
| Aspect | ADF | Synapse |
|---|---|---|
| Publish branch | adf_publish |
workspace_publish |
| Git config location | Manage > Git configuration | Manage > Git configuration |
| ARM template files | ARMTemplateForFactory.json |
TemplateForWorkspace.json |
| Parameters file | ARMTemplateParametersForFactory.json |
TemplateParametersForWorkspace.json |
| Resources stored | Pipelines, datasets, linked services, triggers, IRs | Same + notebooks, SQL scripts, Spark job definitions |
| Live mode toggle | Available | Available |
The main difference is file naming. The workflow logic is identical.
Troubleshooting Common CI/CD Issues
“Deployment failed: Resource not found”
The target workspace doesn’t exist or the Service Principal doesn’t have access. Verify the resource group name and that the SP has Contributor role.
“Trigger cannot be started”
The trigger might reference a pipeline that failed to deploy. Check the deployment logs for earlier errors.
“ARM template validation failed”
Usually caused by linked service references that don’t exist in the target environment. Make sure all linked services referenced by pipelines are included in the ARM template.
“Merge conflicts in JSON files”
Two developers modified the same pipeline in different branches. Resolve conflicts in GitHub (or locally) before merging. JSON merge conflicts can be tricky — review carefully.
“Publish button is grayed out”
You’re not on the collaboration branch (main). Switch to main first, then click Publish. You can only publish from the collaboration branch.
Interview Questions
Q: How do you deploy ADF pipelines to production? A: Connect the Dev ADF workspace to GitHub. Developers work in feature branches, create PRs for code review, merge to main, and click Publish to generate ARM templates. A GitHub Actions (or Azure DevOps) CI/CD pipeline picks up the ARM templates and deploys to UAT first, then Production after manual approval.
Q: What is the publish branch in ADF? A: When you click Publish in ADF Studio, it generates ARM templates and pushes them to a special branch (adf_publish for ADF, workspace_publish for Synapse). These ARM templates contain the complete definition of all resources and are the deployment artifacts used by CI/CD.
Q: How do you handle different configurations across environments? A: ADF automatically parameterizes linked service connection strings in the ARM template. You create environment-specific parameter files (uat.parameters.json, prod.parameters.json) that override these values during deployment. Secrets are stored in Azure Key Vault.
Q: Can multiple developers work on the same ADF at the same time? A: Yes. Each developer works in their own feature branch. Changes are merged via pull requests. If two developers modify the same pipeline, they resolve merge conflicts in Git before merging.
Q: What happens if someone edits the production ADF directly? A: Their changes will be overwritten on the next CI/CD deployment. Production should never be edited directly — only deployed to through the CI/CD pipeline.
Q: Why do you stop triggers before deployment? A: To prevent pipelines from running during deployment. A trigger that fires while resources are being updated could run a half-deployed pipeline, causing failures or data corruption.
Wrapping Up
CI/CD for data pipelines follows the same principles as CI/CD for application code — version control, code review, automated testing, and automated deployment. The tools are different (ARM templates instead of Docker images, GitHub Actions instead of Jenkins), but the workflow is the same.
Setting up CI/CD takes a few hours upfront, but it pays for itself immediately: – No more “it works in dev but breaks in prod” – No more “who changed the pipeline last night?” – No more manual deployments that miss a dataset or linked service
This is how every serious data platform operates in 2026. Master it, and you’re operating at a senior data engineer level.
Related posts: – What is Azure Data Factory? – ADF vs Synapse Comparison – Metadata-Driven Pipeline in ADF – Synapse Pipeline with Audit Logging – Top 15 ADF Interview Questions
If this guide helped you understand CI/CD for data pipelines, share it with your team. Have questions? Drop a comment below.
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.