Azure Integration Runtime Explained: Types, Use Cases, and Real-World Examples
Every time you run a Copy activity, execute a Data Flow, or call a stored procedure in Azure Data Factory or Synapse, something needs to actually DO the work — read from the source, move the bytes, write to the destination. That something is the Integration Runtime (IR).
Integration Runtime is the compute infrastructure behind your pipelines. Choosing the wrong IR means your pipeline either fails silently (cannot reach the source), runs slowly (wrong region), or costs more than it should (oversized cluster).
Despite its importance, IR is one of the least understood components in ADF. Most beginners use the default Azure IR without knowing what it does or that alternatives exist. This post changes that.
I will cover all three types of Integration Runtime in detail — when to use each, how to set them up, real-world scenarios, performance optimization, and the interview questions you should be ready for.
Table of Contents
- What is Integration Runtime?
- The Three Types of Integration Runtime
- Azure Integration Runtime (Auto-Resolve and Custom)
- Self-Hosted Integration Runtime
- Azure-SSIS Integration Runtime
- Comparison Table: All Three IR Types
- How to Check Which IR Your Pipeline Uses
- Creating and Managing Integration Runtimes
- Real-World Architecture Scenarios
- Performance Optimization with IR
- IR and Network Security
- Common IR Errors and Troubleshooting
- Cost Implications
- Interview Questions
- Wrapping Up
What is Integration Runtime?
Integration Runtime is the compute engine that ADF/Synapse uses to execute pipeline activities. Think of it as the worker that does the actual data movement and transformation.
When you create a Copy activity that reads from Azure SQL and writes to ADLS Gen2, the IR is the compute that:
- Connects to Azure SQL Database
- Executes the SELECT query
- Reads the result set
- Serializes the data into Parquet/CSV format
- Writes the files to ADLS Gen2
Without an IR, your pipeline is just a blueprint with no engine to run it.
Why You Need to Understand IR
Most beginners never think about IR because ADF automatically assigns a default one. But understanding IR matters when:
- Your source data is on-premises or in a private network (the default IR cannot reach it)
- Your pipeline is slow and you need to optimize data transfer speed
- You have compliance requirements that restrict where data can be processed
- You need to run SSIS packages in the cloud
- You want to control costs by right-sizing your compute
- You are preparing for an interview (IR questions are common)
The Three Types of Integration Runtime
| Type | What It Does | When to Use It |
|---|---|---|
| Azure IR | Managed cloud compute for cloud-to-cloud operations | Default. Moving data between Azure services or public cloud endpoints |
| Self-Hosted IR | Software you install on your own machine/VM | Accessing on-premises databases, private networks, or file shares behind a firewall |
| Azure-SSIS IR | Managed cluster that runs SSIS packages | Lift-and-shift of existing SQL Server Integration Services workloads |
Let us explore each one in depth.
Azure Integration Runtime (Auto-Resolve and Custom)
What It Is
Azure IR is a fully managed, serverless compute provided by Microsoft. You do not install anything, manage any VMs, or worry about scaling. ADF handles everything.
When you create a new ADF workspace, it comes with a default Azure IR called AutoResolveIntegrationRuntime. This is what your pipelines use unless you explicitly specify a different one.
Two Flavors of Azure IR
1. Auto-Resolve IR (Default)
The default IR. It automatically selects the best Azure region based on your source and sink locations:
- If both source and sink are in the same region, it uses that region (fastest, no cross-region charges)
- If they are in different regions, it picks the region closest to the sink (minimizes write latency)
- If the region cannot be determined, it uses the ADF workspace region
Source: Azure SQL (East US)
Sink: ADLS Gen2 (East US)
Auto-Resolve IR: Uses East US (same region -- optimal)
Source: Azure SQL (East US)
Sink: ADLS Gen2 (West Europe)
Auto-Resolve IR: Uses West Europe (closer to sink)
Advantages: – Zero configuration – Automatic region optimization – No maintenance – Scales automatically
Limitations: – You cannot control which region it uses – Cannot access on-premises or private network resources – For Data Flows, it uses a default cluster configuration
2. Custom Azure IR
A manually created Azure IR where you specify the region and configure Data Flow compute settings.
When to create a custom Azure IR:
- You want to force a specific region for compliance or data residency
- You need to optimize Data Flow cluster settings (core count, time-to-live)
- You want a managed Virtual Network IR for secure data access
- You need different IR configurations for different pipelines
Activities Supported by Azure IR
| Activity | Supported | Notes |
|---|---|---|
| Copy Activity | Yes | Cloud-to-cloud data movement |
| Data Flow | Yes | Spark-based transformations |
| Lookup | Yes | Reading from cloud sources |
| Get Metadata | Yes | File/folder information |
| Stored Procedure | Yes | On Azure SQL, Synapse SQL |
| Web Activity | Yes | Calling REST APIs |
| ForEach, If Condition, etc. | Yes | Control flow (runs on ADF service, not IR) |
Data Flow Compute Configuration
When using Azure IR for Data Flows, you can configure:
Compute type: General Purpose / Compute Optimized / Memory Optimized
Core count: 8 / 16 / 32 / 48 / 80 / 144 / 272
Time to live (TTL): 0 to 60 minutes (keeps cluster warm between runs)
Time to Live (TTL) is important for performance: Data Flows require a Spark cluster, which takes 3-5 minutes to start (cold start). Setting TTL to 10 minutes means if another Data Flow runs within 10 minutes, it reuses the warm cluster — no startup delay.
TTL = 0: Every Data Flow waits 3-5 minutes for cluster startup
TTL = 10: Second run within 10 minutes reuses the warm cluster (instant start)
TTL = 60: Cluster stays warm for an hour (costs more when idle)
Self-Hosted Integration Runtime
What It Is
Self-Hosted IR is software you install on a Windows machine inside your corporate network or private environment. It acts as a bridge between your private data sources and ADF in the cloud.
The machine running Self-Hosted IR needs: – Network access to your on-premises data sources (SQL Server, Oracle, file shares) – Internet access to communicate with the ADF service (outbound HTTPS on port 443)
Why It Exists
Azure IR can only reach public endpoints — Azure services and internet-accessible URLs. It CANNOT reach:
- On-premises SQL Server behind a corporate firewall
- Oracle databases in a private data center
- File shares on your company network
- VMs in an Azure VNet without public endpoints
- Any resource that requires VPN or ExpressRoute access
Self-Hosted IR solves this by running inside your network and tunneling data to ADF through a secure outbound connection.
Architecture
Corporate Network (Private) Azure Cloud (Public)
+----------------------------------+ +------------------------+
| | | |
| On-Premises SQL Server | | Azure Data Factory |
| | | | | |
| v | | v |
| Self-Hosted IR (Windows machine)|------> ADF Service |
| | | | | |
| Oracle Database | | v |
| | | ADLS Gen2 (Sink) |
| File Share (CSV files) | | |
+----------------------------------+ +------------------------+
Outbound HTTPS (443)
(Firewall allows outbound)
The Self-Hosted IR initiates an outbound connection to ADF. No inbound firewall rules are needed — this is a key security advantage.
How to Install Self-Hosted IR
Step 1: Create the IR in ADF
- Go to Manage tab > Integration runtimes > + New
- Select Self-Hosted > Continue
- Name it:
IR_SelfHosted_OnPrem - Click Create
- Copy one of the authentication keys (Key 1 or Key 2)
Step 2: Install on Your Machine
- Download the Self-Hosted IR installer from the link shown (or from Microsoft Download Center)
- Run the installer on a Windows machine inside your private network
- Requirements:
- Windows 10, Windows Server 2016 or later
- .NET Framework 4.7.2 or later
- Minimum 4 cores, 8 GB RAM (recommended: 8 cores, 16 GB)
- Outbound internet access on port 443
- During setup, paste the authentication key from Step 1
- The IR registers with your ADF workspace
Step 3: Verify Connection
Back in ADF Studio, go to Manage > Integration runtimes. Your Self-Hosted IR should show status “Running” with a green indicator.
High Availability
For production, install Self-Hosted IR on multiple machines (nodes) for high availability:
- Install on Machine 1 using Key 1
- Install on Machine 2 using the same key (Key 1 or Key 2)
- Both nodes register with the same logical IR
- If Machine 1 goes down, Machine 2 handles all traffic
You can have up to 4 nodes per Self-Hosted IR.
Activities Supported by Self-Hosted IR
| Activity | Supported | Notes |
|---|---|---|
| Copy Activity | Yes | On-prem to cloud and cloud to on-prem |
| Lookup | Yes | Reading from on-prem databases |
| Get Metadata | Yes | File system information |
| Stored Procedure | Yes | On on-prem SQL Server |
| Data Flow | No | Data Flows require Azure IR (Spark cluster) |
Important: Data Flows are NOT supported on Self-Hosted IR. If you need to transform on-premises data, first copy it to ADLS Gen2 using Self-Hosted IR, then run a Data Flow using Azure IR.
Real-World Use Cases
1. Daily ETL from On-Premises SQL Server to Azure Data Lake
Source: SQL Server 2019 (on-premises, corporate network)
IR: Self-Hosted IR on a Windows Server in the same network
Sink: ADLS Gen2 (Azure)
Pipeline: Lookup (read metadata) > ForEach > Copy (SQL Server to ADLS)
2. Migrating Oracle Data Warehouse to Azure Synapse
Source: Oracle 19c (private data center)
IR: Self-Hosted IR installed on a jump box with Oracle client
Sink: Azure Synapse Dedicated SQL Pool
Pipeline: Full load of 200+ tables using parameterized datasets
3. Processing CSV Files from a Network File Share
Source: \fileserver
eports\daily\ (Windows file share)
IR: Self-Hosted IR on a machine with access to the share
Sink: Azure Blob Storage
Pipeline: Get Metadata (list files) > ForEach > Copy (file share to blob)
4. Hybrid Cloud: Some Data in Azure, Some On-Premises
Pipeline activities:
1. Copy from on-prem SQL (Self-Hosted IR) to ADLS staging
2. Copy from Azure SQL (Azure IR) to ADLS staging
3. Data Flow (Azure IR) to join and transform both datasets
4. Write results to Synapse SQL Pool (Azure IR)
A single pipeline can use multiple IRs for different activities.
Azure-SSIS Integration Runtime
What It Is
Azure-SSIS IR is a managed cluster of VMs that runs SQL Server Integration Services (SSIS) packages in the cloud. It is designed for organizations that have existing SSIS workloads and want to migrate them to Azure without rewriting them.
When to Use It
- You have existing SSIS packages (.dtsx files) that you need to run in the cloud
- You are doing a lift-and-shift migration from on-premises SSIS to Azure
- Your team has SSIS expertise and does not want to rewrite everything in ADF
- You need SSIS-specific features (complex transformations, custom components)
When NOT to Use It
- You are building new pipelines from scratch — use ADF native activities instead
- You do not have existing SSIS packages — there is no reason to learn SSIS in 2026
- You want serverless — Azure-SSIS IR provisions dedicated VMs, which cost money even when idle
How It Works
- Create an Azure-SSIS IR in ADF (provisions a cluster of Windows VMs)
- Deploy your .dtsx packages to the IR (via SSISDB catalog or file system)
- Use the Execute SSIS Package activity in your ADF pipeline to run packages
- The IR executes the package on the provisioned cluster
- Results flow through the ADF pipeline like any other activity
Cluster Configuration
Node size: Standard_D2_v3 (2 cores) to Standard_E64i_v3 (64 cores)
Node count: 1 to 10 nodes
SSISDB: Azure SQL Database or Managed Instance (stores package catalog)
VNet: Optional (for accessing private resources)
Cost Consideration
Unlike Azure IR (serverless, pay-per-use), Azure-SSIS IR provisions dedicated VMs that cost money as long as they are running — even if no packages are executing. Always stop the IR when not in use.
Example: Standard_D2_v3 (2 cores) x 2 nodes
Cost: approximately $0.84/hour = $20/day = $600/month (if running 24/7)
Tip: Start before your scheduled runs, stop after completion
Comparison Table: All Three IR Types
| Feature | Azure IR | Self-Hosted IR | Azure-SSIS IR |
|---|---|---|---|
| Managed by | Microsoft (fully managed) | You (install and maintain) | Microsoft (managed cluster) |
| Compute | Serverless (auto-scale) | Your machine (fixed) | Dedicated VMs (configurable) |
| Cost model | Pay per activity run | Free (but you pay for your machine) | Pay per VM hour (even when idle) |
| Region | Auto-resolve or specified | Where your machine is | Specified |
| On-premises access | No | Yes (primary use case) | Yes (with VNet) |
| Azure VNet access | Yes (with managed VNet) | Yes | Yes |
| Public cloud access | Yes | Yes (outbound) | Yes |
| Copy Activity | Yes | Yes | Via SSIS package |
| Data Flow | Yes | No | No |
| SSIS packages | No | No | Yes (primary use case) |
| Lookup | Yes | Yes | No |
| Get Metadata | Yes | Yes | No |
| Setup time | Instant | 15-30 minutes | 20-30 minutes (cluster provisioning) |
| High availability | Built-in | Multi-node (up to 4) | Multi-node cluster |
| Auto-update | Automatic | Manual or auto-update | Automatic |
How to Check Which IR Your Pipeline Uses
Method 1: Check the Linked Service
Every Linked Service specifies which IR to use:
- Go to Manage > Linked services
- Click on your linked service
- Look for “Connect via integration runtime” — it shows the IR name
Method 2: Check the Pipeline Run
After running a pipeline:
- Go to Monitor > Pipeline runs
- Click on the run > click on the Copy activity
- In the output, look for
effectiveIntegrationRuntime— this tells you which IR was used
{
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime",
"executionDetails": [{
"source": {"type": "AzureSqlDatabase"},
"sink": {"type": "AzureBlobFS"},
"status": "Succeeded"
}]
}
Method 3: List All IRs
Go to Manage > Integration runtimes to see all IRs in your workspace, their types, status, and which linked services reference them.
Creating and Managing Integration Runtimes
Create a Custom Azure IR
- Manage > Integration runtimes > + New
- Select Azure, Self-Hosted > Continue
- Select Azure > Continue
- Configure:
- Name:
IR_Azure_EastUS - Region: East US (or your preferred region)
- Data Flow compute: General Purpose, 8 cores
- Time to live: 10 minutes
- Click Create
Create a Self-Hosted IR
- Manage > Integration runtimes > + New
- Select Azure, Self-Hosted > Continue
- Select Self-Hosted > Continue
- Name:
IR_SelfHosted_OnPrem - Click Create
- Copy the authentication key
- Download and install on your Windows machine
- Register using the key
Create an Azure-SSIS IR
- Manage > Integration runtimes > + New
- Select Azure-SSIS > Continue
- Configure General settings:
- Name:
IR_SSIS_Production - Region: East US
- Node size: Standard_D2_v3
- Node count: 2
- Configure SQL settings:
- SSISDB server: your Azure SQL server
- Admin credentials
- Configure Advanced settings (VNet, proxy, custom setup)
- Click Create (takes 20-30 minutes to provision)
Real-World Architecture Scenarios
Scenario 1: Pure Cloud (Azure IR Only)
Azure SQL Database --[Azure IR]--> ADLS Gen2 --[Azure IR]--> Synapse SQL Pool
All resources are in Azure with public endpoints. Default Auto-Resolve IR handles everything. Simplest setup.
Scenario 2: Hybrid (Azure IR + Self-Hosted IR)
On-Prem SQL Server --[Self-Hosted IR]--> ADLS Gen2 (staging)
Azure SQL Database --[Azure IR]--------> ADLS Gen2 (staging)
ADLS Gen2 (staging) --[Azure IR/Data Flow]--> Synapse SQL Pool
On-premises data uses Self-Hosted IR. Cloud data uses Azure IR. Transformation uses Azure IR with Data Flows.
Scenario 3: SSIS Migration (All Three IRs)
On-Prem Oracle --[Self-Hosted IR]--> ADLS Gen2 (raw)
ADLS Gen2 (raw) --[Azure-SSIS IR]--> Execute SSIS packages for transformation
SSIS output --[Azure IR]--> Synapse SQL Pool
Legacy SSIS packages handle complex transformation logic that has not been migrated to ADF yet.
Scenario 4: Multi-Region Compliance
EU Region:
Azure SQL (West Europe) --[Azure IR West Europe]--> ADLS Gen2 (West Europe)
NA Region:
Azure SQL (East US) --[Azure IR East US]--> ADLS Gen2 (East US)
Custom Azure IRs in specific regions ensure data never leaves the geographic boundary — required for GDPR and data residency compliance.
Performance Optimization with IR
1. Use the Right Region
Place your Azure IR in the same region as your source and sink. Cross-region data transfer is slower and may incur egress charges.
SLOW: Source (East US) --[IR in West Europe]--> Sink (East US)
FAST: Source (East US) --[IR in East US]--> Sink (East US)
2. Increase DIUs for Copy Activity
Data Integration Units (DIUs) control Copy activity parallelism on Azure IR:
DIU = Auto (4): Default, good for small datasets
DIU = 16: Medium datasets (10 GB+)
DIU = 32: Large datasets (100 GB+)
DIU = 256: Maximum, for very large bulk copies
More DIUs = more parallel threads reading and writing data.
3. Use TTL for Data Flows
Set Time to Live on your Azure IR to avoid cold start delays:
TTL = 0: 3-5 minute startup every time
TTL = 10: Reuses cluster if next run is within 10 minutes
TTL = 60: Keeps cluster warm for an hour
4. Optimize Self-Hosted IR Machine
For Self-Hosted IR, the machine specs directly impact performance:
Minimum: 4 cores, 8 GB RAM
Recommended: 8 cores, 16 GB RAM
For heavy workloads: 16+ cores, 32+ GB RAM, SSD storage
Also ensure the machine has fast network connectivity to both the source database and the internet.
5. Enable Parallel Copy
For Copy activities using Self-Hosted IR, set parallelCopies in the Copy activity settings to read multiple partitions simultaneously.
IR and Network Security
Managed Virtual Network (Azure IR)
For scenarios where you need Azure IR to access resources in an Azure VNet (without public endpoints):
- Create a custom Azure IR with Managed Virtual Network enabled
- Create Managed Private Endpoints for each service you need to access
- Approve the private endpoint connections in the target service
This provides private, secure connectivity from Azure IR to your Azure resources without exposing them to the public internet.
Self-Hosted IR Security
- Only makes outbound connections (port 443 HTTPS) — no inbound firewall rules needed
- Communication with ADF is encrypted
- Credentials are stored in Windows Credential Manager on the IR machine
- Supports proxy servers for internet access
- Authentication keys can be rotated without downtime
Network Requirements for Self-Hosted IR
| Direction | Port | Protocol | Purpose |
|---|---|---|---|
| Outbound | 443 | HTTPS | Communication with ADF service |
| Outbound | 443 | HTTPS | Azure Service Bus (for IR management) |
| Inbound | None | — | No inbound rules required |
| Local | — | — | Access to on-premises data sources (SQL port 1433, Oracle 1521, etc.) |
Common IR Errors and Troubleshooting
“The Integration Runtime is offline”
Cause: Self-Hosted IR machine is turned off, the IR service is stopped, or the machine lost internet connectivity.
Fix: Check the machine status. Start the IR service (Microsoft Integration Runtime service in Windows Services). Verify outbound port 443 access.
“Unable to connect to the data source”
Cause: The IR cannot reach the source database. For Azure IR, the source might require private network access. For Self-Hosted IR, the machine might not have network access to the database.
Fix: For Azure IR — ensure the source allows public access or use Managed VNet. For Self-Hosted IR — test connectivity from the IR machine (telnet to the database port).
“Data Flow timeout: cluster startup exceeded 10 minutes”
Cause: The Spark cluster for Data Flows took too long to provision.
Fix: Enable TTL on your Azure IR so the cluster stays warm between runs. Use a smaller cluster size for faster startup.
“Copy activity slow: low throughput”
Cause: Undersized IR or wrong region.
Fix: Increase DIUs in the Copy activity. Place Azure IR in the same region as source and sink. For Self-Hosted IR, upgrade the machine specs.
“Self-Hosted IR: high CPU/memory usage”
Cause: Too many concurrent activities running on the same IR machine.
Fix: Add more nodes (up to 4). Reduce concurrent Copy activities. Upgrade machine specs. Use multiple Self-Hosted IRs for different workloads.
Cost Implications
| IR Type | How You Pay | Typical Cost |
|---|---|---|
| Azure IR (Copy) | Per DIU-hour | ~$0.25/DIU-hour. 4 DIU for 5 minutes = $0.08 |
| Azure IR (Data Flow) | Per vCore-hour | ~$0.27/vCore-hour. 8 cores for 10 minutes = $0.36 |
| Azure IR (Pipeline) | Per activity run | $1.00 per 1,000 runs |
| Self-Hosted IR | Free (IR itself) | You pay for the machine (VM or physical) |
| Azure-SSIS IR | Per VM-hour | ~$0.84/hour for D2v3. Runs 24/7 = $600/month |
Cost optimization tips:
- Stop Azure-SSIS IR when not in use (biggest cost saver)
- Use Auto-Resolve Azure IR when possible (no cluster to manage)
- Right-size Data Flow clusters (do not default to 272 cores)
- Set appropriate TTL (too high = waste, too low = cold starts)
- Self-Hosted IR is “free” but the machine costs money — right-size it
Interview Questions
Q: What is Integration Runtime in ADF? A: Integration Runtime is the compute infrastructure that executes pipeline activities. It handles data movement, transformation, and activity dispatch. There are three types: Azure IR (managed cloud), Self-Hosted IR (on-premises bridge), and Azure-SSIS IR (runs SSIS packages).
Q: When do you need a Self-Hosted Integration Runtime? A: When your source or destination is behind a firewall, in a private network, or on-premises. The Self-Hosted IR runs on a machine inside your network and creates an outbound connection to ADF, acting as a bridge between private resources and the cloud.
Q: Can a single pipeline use multiple Integration Runtimes? A: Yes. Different activities in the same pipeline can use different IRs. For example, a Copy activity might use Self-Hosted IR to read from on-premises SQL, while a Data Flow uses Azure IR for transformation.
Q: What is the difference between Auto-Resolve and a custom Azure IR? A: Auto-Resolve automatically selects the optimal region based on source and sink locations. A custom Azure IR lets you specify a fixed region, configure Data Flow cluster settings, and optionally enable Managed Virtual Network for private connectivity.
Q: How do you make Self-Hosted IR highly available? A: Install the IR on multiple machines (up to 4 nodes) using the same authentication key. If one node goes down, others continue processing. All nodes register with the same logical IR in ADF.
Q: What is TTL in Azure IR? A: Time to Live keeps the Spark cluster warm after a Data Flow execution. Setting TTL to 10 minutes means a second Data Flow within 10 minutes reuses the same cluster without the 3-5 minute cold start. Trade-off is cost — the cluster runs during the TTL period even if idle.
Q: Does Self-Hosted IR support Data Flows? A: No. Data Flows require a Spark cluster, which only Azure IR provides. To transform on-premises data, first copy it to ADLS Gen2 using Self-Hosted IR, then run a Data Flow using Azure IR.
Q: How does Azure-SSIS IR differ from the other two? A: Azure-SSIS IR provisions dedicated Windows VMs that run SSIS packages. It is designed for lift-and-shift migration of existing SSIS workloads. Unlike Azure IR (serverless) and Self-Hosted IR (your machine), Azure-SSIS IR has dedicated infrastructure costs that accrue even when idle.
Wrapping Up
Integration Runtime is the engine behind every ADF/Synapse pipeline. Choosing the right IR for each scenario ensures your pipelines are fast, secure, cost-effective, and able to reach all your data sources.
The decision framework:
- Data is in Azure or public cloud? Use Azure IR (default)
- Data is on-premises or in a private network? Use Self-Hosted IR
- Running existing SSIS packages? Use Azure-SSIS IR
- Need to process data in a specific region? Create a custom Azure IR
- Need private Azure-to-Azure connectivity? Use Azure IR with Managed Virtual Network
Master IR, and you unlock the ability to build pipelines that reach any data source, anywhere.
Related posts: – What is Azure Data Factory? – ADF vs Synapse Comparison – Metadata-Driven Pipeline in ADF – Top 15 ADF Interview Questions – Common ADF/Synapse Errors
If this guide helped you understand Integration Runtime, share it with your team. Questions? Drop a comment below.
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.