Azure Integration Runtime Explained: Types, Use Cases, and Real-World Examples

Azure Integration Runtime Explained: Types, Use Cases, and Real-World Examples

Every time you run a Copy activity, execute a Data Flow, or call a stored procedure in Azure Data Factory or Synapse, something needs to actually DO the work — read from the source, move the bytes, write to the destination. That something is the Integration Runtime (IR).

Integration Runtime is the compute infrastructure behind your pipelines. Choosing the wrong IR means your pipeline either fails silently (cannot reach the source), runs slowly (wrong region), or costs more than it should (oversized cluster).

Despite its importance, IR is one of the least understood components in ADF. Most beginners use the default Azure IR without knowing what it does or that alternatives exist. This post changes that.

I will cover all three types of Integration Runtime in detail — when to use each, how to set them up, real-world scenarios, performance optimization, and the interview questions you should be ready for.

Table of Contents

  • What is Integration Runtime?
  • The Three Types of Integration Runtime
  • Azure Integration Runtime (Auto-Resolve and Custom)
  • Self-Hosted Integration Runtime
  • Azure-SSIS Integration Runtime
  • Comparison Table: All Three IR Types
  • How to Check Which IR Your Pipeline Uses
  • Creating and Managing Integration Runtimes
  • Real-World Architecture Scenarios
  • Performance Optimization with IR
  • IR and Network Security
  • Common IR Errors and Troubleshooting
  • Cost Implications
  • Interview Questions
  • Wrapping Up

What is Integration Runtime?

Integration Runtime is the compute engine that ADF/Synapse uses to execute pipeline activities. Think of it as the worker that does the actual data movement and transformation.

When you create a Copy activity that reads from Azure SQL and writes to ADLS Gen2, the IR is the compute that:

  1. Connects to Azure SQL Database
  2. Executes the SELECT query
  3. Reads the result set
  4. Serializes the data into Parquet/CSV format
  5. Writes the files to ADLS Gen2

Without an IR, your pipeline is just a blueprint with no engine to run it.

Why You Need to Understand IR

Most beginners never think about IR because ADF automatically assigns a default one. But understanding IR matters when:

  • Your source data is on-premises or in a private network (the default IR cannot reach it)
  • Your pipeline is slow and you need to optimize data transfer speed
  • You have compliance requirements that restrict where data can be processed
  • You need to run SSIS packages in the cloud
  • You want to control costs by right-sizing your compute
  • You are preparing for an interview (IR questions are common)

The Three Types of Integration Runtime

Type What It Does When to Use It
Azure IR Managed cloud compute for cloud-to-cloud operations Default. Moving data between Azure services or public cloud endpoints
Self-Hosted IR Software you install on your own machine/VM Accessing on-premises databases, private networks, or file shares behind a firewall
Azure-SSIS IR Managed cluster that runs SSIS packages Lift-and-shift of existing SQL Server Integration Services workloads

Let us explore each one in depth.

Azure Integration Runtime (Auto-Resolve and Custom)

What It Is

Azure IR is a fully managed, serverless compute provided by Microsoft. You do not install anything, manage any VMs, or worry about scaling. ADF handles everything.

When you create a new ADF workspace, it comes with a default Azure IR called AutoResolveIntegrationRuntime. This is what your pipelines use unless you explicitly specify a different one.

Two Flavors of Azure IR

1. Auto-Resolve IR (Default)

The default IR. It automatically selects the best Azure region based on your source and sink locations:

  • If both source and sink are in the same region, it uses that region (fastest, no cross-region charges)
  • If they are in different regions, it picks the region closest to the sink (minimizes write latency)
  • If the region cannot be determined, it uses the ADF workspace region
Source: Azure SQL (East US)
Sink: ADLS Gen2 (East US)
Auto-Resolve IR: Uses East US (same region -- optimal)

Source: Azure SQL (East US)
Sink: ADLS Gen2 (West Europe)
Auto-Resolve IR: Uses West Europe (closer to sink)

Advantages: – Zero configuration – Automatic region optimization – No maintenance – Scales automatically

Limitations: – You cannot control which region it uses – Cannot access on-premises or private network resources – For Data Flows, it uses a default cluster configuration

2. Custom Azure IR

A manually created Azure IR where you specify the region and configure Data Flow compute settings.

When to create a custom Azure IR:

  • You want to force a specific region for compliance or data residency
  • You need to optimize Data Flow cluster settings (core count, time-to-live)
  • You want a managed Virtual Network IR for secure data access
  • You need different IR configurations for different pipelines

Activities Supported by Azure IR

Activity Supported Notes
Copy Activity Yes Cloud-to-cloud data movement
Data Flow Yes Spark-based transformations
Lookup Yes Reading from cloud sources
Get Metadata Yes File/folder information
Stored Procedure Yes On Azure SQL, Synapse SQL
Web Activity Yes Calling REST APIs
ForEach, If Condition, etc. Yes Control flow (runs on ADF service, not IR)

Data Flow Compute Configuration

When using Azure IR for Data Flows, you can configure:

Compute type: General Purpose / Compute Optimized / Memory Optimized
Core count: 8 / 16 / 32 / 48 / 80 / 144 / 272
Time to live (TTL): 0 to 60 minutes (keeps cluster warm between runs)

Time to Live (TTL) is important for performance: Data Flows require a Spark cluster, which takes 3-5 minutes to start (cold start). Setting TTL to 10 minutes means if another Data Flow runs within 10 minutes, it reuses the warm cluster — no startup delay.

TTL = 0:  Every Data Flow waits 3-5 minutes for cluster startup
TTL = 10: Second run within 10 minutes reuses the warm cluster (instant start)
TTL = 60: Cluster stays warm for an hour (costs more when idle)

Self-Hosted Integration Runtime

What It Is

Self-Hosted IR is software you install on a Windows machine inside your corporate network or private environment. It acts as a bridge between your private data sources and ADF in the cloud.

The machine running Self-Hosted IR needs: – Network access to your on-premises data sources (SQL Server, Oracle, file shares) – Internet access to communicate with the ADF service (outbound HTTPS on port 443)

Why It Exists

Azure IR can only reach public endpoints — Azure services and internet-accessible URLs. It CANNOT reach:

  • On-premises SQL Server behind a corporate firewall
  • Oracle databases in a private data center
  • File shares on your company network
  • VMs in an Azure VNet without public endpoints
  • Any resource that requires VPN or ExpressRoute access

Self-Hosted IR solves this by running inside your network and tunneling data to ADF through a secure outbound connection.

Architecture

Corporate Network (Private)             Azure Cloud (Public)
+----------------------------------+    +------------------------+
|                                  |    |                        |
|  On-Premises SQL Server          |    |  Azure Data Factory    |
|       |                          |    |       |                |
|       v                          |    |       v                |
|  Self-Hosted IR (Windows machine)|------>  ADF Service         |
|       |                          |    |       |                |
|  Oracle Database                 |    |       v                |
|                                  |    |  ADLS Gen2 (Sink)      |
|  File Share (CSV files)          |    |                        |
+----------------------------------+    +------------------------+
         Outbound HTTPS (443)
         (Firewall allows outbound)

The Self-Hosted IR initiates an outbound connection to ADF. No inbound firewall rules are needed — this is a key security advantage.

How to Install Self-Hosted IR

Step 1: Create the IR in ADF

  1. Go to Manage tab > Integration runtimes > + New
  2. Select Self-Hosted > Continue
  3. Name it: IR_SelfHosted_OnPrem
  4. Click Create
  5. Copy one of the authentication keys (Key 1 or Key 2)

Step 2: Install on Your Machine

  1. Download the Self-Hosted IR installer from the link shown (or from Microsoft Download Center)
  2. Run the installer on a Windows machine inside your private network
  3. Requirements:
  4. Windows 10, Windows Server 2016 or later
  5. .NET Framework 4.7.2 or later
  6. Minimum 4 cores, 8 GB RAM (recommended: 8 cores, 16 GB)
  7. Outbound internet access on port 443
  8. During setup, paste the authentication key from Step 1
  9. The IR registers with your ADF workspace

Step 3: Verify Connection

Back in ADF Studio, go to Manage > Integration runtimes. Your Self-Hosted IR should show status “Running” with a green indicator.

High Availability

For production, install Self-Hosted IR on multiple machines (nodes) for high availability:

  1. Install on Machine 1 using Key 1
  2. Install on Machine 2 using the same key (Key 1 or Key 2)
  3. Both nodes register with the same logical IR
  4. If Machine 1 goes down, Machine 2 handles all traffic

You can have up to 4 nodes per Self-Hosted IR.

Activities Supported by Self-Hosted IR

Activity Supported Notes
Copy Activity Yes On-prem to cloud and cloud to on-prem
Lookup Yes Reading from on-prem databases
Get Metadata Yes File system information
Stored Procedure Yes On on-prem SQL Server
Data Flow No Data Flows require Azure IR (Spark cluster)

Important: Data Flows are NOT supported on Self-Hosted IR. If you need to transform on-premises data, first copy it to ADLS Gen2 using Self-Hosted IR, then run a Data Flow using Azure IR.

Real-World Use Cases

1. Daily ETL from On-Premises SQL Server to Azure Data Lake

Source: SQL Server 2019 (on-premises, corporate network)
IR: Self-Hosted IR on a Windows Server in the same network
Sink: ADLS Gen2 (Azure)

Pipeline: Lookup (read metadata) > ForEach > Copy (SQL Server to ADLS)

2. Migrating Oracle Data Warehouse to Azure Synapse

Source: Oracle 19c (private data center)
IR: Self-Hosted IR installed on a jump box with Oracle client
Sink: Azure Synapse Dedicated SQL Pool

Pipeline: Full load of 200+ tables using parameterized datasets

3. Processing CSV Files from a Network File Share

Source: \fileserver
eports\daily\ (Windows file share)
IR: Self-Hosted IR on a machine with access to the share
Sink: Azure Blob Storage

Pipeline: Get Metadata (list files) > ForEach > Copy (file share to blob)

4. Hybrid Cloud: Some Data in Azure, Some On-Premises

Pipeline activities:
  1. Copy from on-prem SQL (Self-Hosted IR) to ADLS staging
  2. Copy from Azure SQL (Azure IR) to ADLS staging
  3. Data Flow (Azure IR) to join and transform both datasets
  4. Write results to Synapse SQL Pool (Azure IR)

A single pipeline can use multiple IRs for different activities.

Azure-SSIS Integration Runtime

What It Is

Azure-SSIS IR is a managed cluster of VMs that runs SQL Server Integration Services (SSIS) packages in the cloud. It is designed for organizations that have existing SSIS workloads and want to migrate them to Azure without rewriting them.

When to Use It

  • You have existing SSIS packages (.dtsx files) that you need to run in the cloud
  • You are doing a lift-and-shift migration from on-premises SSIS to Azure
  • Your team has SSIS expertise and does not want to rewrite everything in ADF
  • You need SSIS-specific features (complex transformations, custom components)

When NOT to Use It

  • You are building new pipelines from scratch — use ADF native activities instead
  • You do not have existing SSIS packages — there is no reason to learn SSIS in 2026
  • You want serverless — Azure-SSIS IR provisions dedicated VMs, which cost money even when idle

How It Works

  1. Create an Azure-SSIS IR in ADF (provisions a cluster of Windows VMs)
  2. Deploy your .dtsx packages to the IR (via SSISDB catalog or file system)
  3. Use the Execute SSIS Package activity in your ADF pipeline to run packages
  4. The IR executes the package on the provisioned cluster
  5. Results flow through the ADF pipeline like any other activity

Cluster Configuration

Node size: Standard_D2_v3 (2 cores) to Standard_E64i_v3 (64 cores)
Node count: 1 to 10 nodes
SSISDB: Azure SQL Database or Managed Instance (stores package catalog)
VNet: Optional (for accessing private resources)

Cost Consideration

Unlike Azure IR (serverless, pay-per-use), Azure-SSIS IR provisions dedicated VMs that cost money as long as they are running — even if no packages are executing. Always stop the IR when not in use.

Example: Standard_D2_v3 (2 cores) x 2 nodes
Cost: approximately $0.84/hour = $20/day = $600/month (if running 24/7)
Tip: Start before your scheduled runs, stop after completion

Comparison Table: All Three IR Types

Feature Azure IR Self-Hosted IR Azure-SSIS IR
Managed by Microsoft (fully managed) You (install and maintain) Microsoft (managed cluster)
Compute Serverless (auto-scale) Your machine (fixed) Dedicated VMs (configurable)
Cost model Pay per activity run Free (but you pay for your machine) Pay per VM hour (even when idle)
Region Auto-resolve or specified Where your machine is Specified
On-premises access No Yes (primary use case) Yes (with VNet)
Azure VNet access Yes (with managed VNet) Yes Yes
Public cloud access Yes Yes (outbound) Yes
Copy Activity Yes Yes Via SSIS package
Data Flow Yes No No
SSIS packages No No Yes (primary use case)
Lookup Yes Yes No
Get Metadata Yes Yes No
Setup time Instant 15-30 minutes 20-30 minutes (cluster provisioning)
High availability Built-in Multi-node (up to 4) Multi-node cluster
Auto-update Automatic Manual or auto-update Automatic

How to Check Which IR Your Pipeline Uses

Method 1: Check the Linked Service

Every Linked Service specifies which IR to use:

  1. Go to Manage > Linked services
  2. Click on your linked service
  3. Look for “Connect via integration runtime” — it shows the IR name

Method 2: Check the Pipeline Run

After running a pipeline:

  1. Go to Monitor > Pipeline runs
  2. Click on the run > click on the Copy activity
  3. In the output, look for effectiveIntegrationRuntime — this tells you which IR was used
{
    "effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime",
    "executionDetails": [{
        "source": {"type": "AzureSqlDatabase"},
        "sink": {"type": "AzureBlobFS"},
        "status": "Succeeded"
    }]
}

Method 3: List All IRs

Go to Manage > Integration runtimes to see all IRs in your workspace, their types, status, and which linked services reference them.

Creating and Managing Integration Runtimes

Create a Custom Azure IR

  1. Manage > Integration runtimes > + New
  2. Select Azure, Self-Hosted > Continue
  3. Select Azure > Continue
  4. Configure:
  5. Name: IR_Azure_EastUS
  6. Region: East US (or your preferred region)
  7. Data Flow compute: General Purpose, 8 cores
  8. Time to live: 10 minutes
  9. Click Create

Create a Self-Hosted IR

  1. Manage > Integration runtimes > + New
  2. Select Azure, Self-Hosted > Continue
  3. Select Self-Hosted > Continue
  4. Name: IR_SelfHosted_OnPrem
  5. Click Create
  6. Copy the authentication key
  7. Download and install on your Windows machine
  8. Register using the key

Create an Azure-SSIS IR

  1. Manage > Integration runtimes > + New
  2. Select Azure-SSIS > Continue
  3. Configure General settings:
  4. Name: IR_SSIS_Production
  5. Region: East US
  6. Node size: Standard_D2_v3
  7. Node count: 2
  8. Configure SQL settings:
  9. SSISDB server: your Azure SQL server
  10. Admin credentials
  11. Configure Advanced settings (VNet, proxy, custom setup)
  12. Click Create (takes 20-30 minutes to provision)

Real-World Architecture Scenarios

Scenario 1: Pure Cloud (Azure IR Only)

Azure SQL Database --[Azure IR]--> ADLS Gen2 --[Azure IR]--> Synapse SQL Pool

All resources are in Azure with public endpoints. Default Auto-Resolve IR handles everything. Simplest setup.

Scenario 2: Hybrid (Azure IR + Self-Hosted IR)

On-Prem SQL Server --[Self-Hosted IR]--> ADLS Gen2 (staging)
Azure SQL Database --[Azure IR]--------> ADLS Gen2 (staging)
ADLS Gen2 (staging) --[Azure IR/Data Flow]--> Synapse SQL Pool

On-premises data uses Self-Hosted IR. Cloud data uses Azure IR. Transformation uses Azure IR with Data Flows.

Scenario 3: SSIS Migration (All Three IRs)

On-Prem Oracle --[Self-Hosted IR]--> ADLS Gen2 (raw)
ADLS Gen2 (raw) --[Azure-SSIS IR]--> Execute SSIS packages for transformation
SSIS output --[Azure IR]--> Synapse SQL Pool

Legacy SSIS packages handle complex transformation logic that has not been migrated to ADF yet.

Scenario 4: Multi-Region Compliance

EU Region:
  Azure SQL (West Europe) --[Azure IR West Europe]--> ADLS Gen2 (West Europe)

NA Region:
  Azure SQL (East US) --[Azure IR East US]--> ADLS Gen2 (East US)

Custom Azure IRs in specific regions ensure data never leaves the geographic boundary — required for GDPR and data residency compliance.

Performance Optimization with IR

1. Use the Right Region

Place your Azure IR in the same region as your source and sink. Cross-region data transfer is slower and may incur egress charges.

SLOW: Source (East US) --[IR in West Europe]--> Sink (East US)
FAST: Source (East US) --[IR in East US]--> Sink (East US)

2. Increase DIUs for Copy Activity

Data Integration Units (DIUs) control Copy activity parallelism on Azure IR:

DIU = Auto (4): Default, good for small datasets
DIU = 16: Medium datasets (10 GB+)
DIU = 32: Large datasets (100 GB+)
DIU = 256: Maximum, for very large bulk copies

More DIUs = more parallel threads reading and writing data.

3. Use TTL for Data Flows

Set Time to Live on your Azure IR to avoid cold start delays:

TTL = 0: 3-5 minute startup every time
TTL = 10: Reuses cluster if next run is within 10 minutes
TTL = 60: Keeps cluster warm for an hour

4. Optimize Self-Hosted IR Machine

For Self-Hosted IR, the machine specs directly impact performance:

Minimum: 4 cores, 8 GB RAM
Recommended: 8 cores, 16 GB RAM
For heavy workloads: 16+ cores, 32+ GB RAM, SSD storage

Also ensure the machine has fast network connectivity to both the source database and the internet.

5. Enable Parallel Copy

For Copy activities using Self-Hosted IR, set parallelCopies in the Copy activity settings to read multiple partitions simultaneously.

IR and Network Security

Managed Virtual Network (Azure IR)

For scenarios where you need Azure IR to access resources in an Azure VNet (without public endpoints):

  1. Create a custom Azure IR with Managed Virtual Network enabled
  2. Create Managed Private Endpoints for each service you need to access
  3. Approve the private endpoint connections in the target service

This provides private, secure connectivity from Azure IR to your Azure resources without exposing them to the public internet.

Self-Hosted IR Security

  • Only makes outbound connections (port 443 HTTPS) — no inbound firewall rules needed
  • Communication with ADF is encrypted
  • Credentials are stored in Windows Credential Manager on the IR machine
  • Supports proxy servers for internet access
  • Authentication keys can be rotated without downtime

Network Requirements for Self-Hosted IR

Direction Port Protocol Purpose
Outbound 443 HTTPS Communication with ADF service
Outbound 443 HTTPS Azure Service Bus (for IR management)
Inbound None No inbound rules required
Local Access to on-premises data sources (SQL port 1433, Oracle 1521, etc.)

Common IR Errors and Troubleshooting

“The Integration Runtime is offline”

Cause: Self-Hosted IR machine is turned off, the IR service is stopped, or the machine lost internet connectivity.

Fix: Check the machine status. Start the IR service (Microsoft Integration Runtime service in Windows Services). Verify outbound port 443 access.

“Unable to connect to the data source”

Cause: The IR cannot reach the source database. For Azure IR, the source might require private network access. For Self-Hosted IR, the machine might not have network access to the database.

Fix: For Azure IR — ensure the source allows public access or use Managed VNet. For Self-Hosted IR — test connectivity from the IR machine (telnet to the database port).

“Data Flow timeout: cluster startup exceeded 10 minutes”

Cause: The Spark cluster for Data Flows took too long to provision.

Fix: Enable TTL on your Azure IR so the cluster stays warm between runs. Use a smaller cluster size for faster startup.

“Copy activity slow: low throughput”

Cause: Undersized IR or wrong region.

Fix: Increase DIUs in the Copy activity. Place Azure IR in the same region as source and sink. For Self-Hosted IR, upgrade the machine specs.

“Self-Hosted IR: high CPU/memory usage”

Cause: Too many concurrent activities running on the same IR machine.

Fix: Add more nodes (up to 4). Reduce concurrent Copy activities. Upgrade machine specs. Use multiple Self-Hosted IRs for different workloads.

Cost Implications

IR Type How You Pay Typical Cost
Azure IR (Copy) Per DIU-hour ~$0.25/DIU-hour. 4 DIU for 5 minutes = $0.08
Azure IR (Data Flow) Per vCore-hour ~$0.27/vCore-hour. 8 cores for 10 minutes = $0.36
Azure IR (Pipeline) Per activity run $1.00 per 1,000 runs
Self-Hosted IR Free (IR itself) You pay for the machine (VM or physical)
Azure-SSIS IR Per VM-hour ~$0.84/hour for D2v3. Runs 24/7 = $600/month

Cost optimization tips:

  1. Stop Azure-SSIS IR when not in use (biggest cost saver)
  2. Use Auto-Resolve Azure IR when possible (no cluster to manage)
  3. Right-size Data Flow clusters (do not default to 272 cores)
  4. Set appropriate TTL (too high = waste, too low = cold starts)
  5. Self-Hosted IR is “free” but the machine costs money — right-size it

Interview Questions

Q: What is Integration Runtime in ADF? A: Integration Runtime is the compute infrastructure that executes pipeline activities. It handles data movement, transformation, and activity dispatch. There are three types: Azure IR (managed cloud), Self-Hosted IR (on-premises bridge), and Azure-SSIS IR (runs SSIS packages).

Q: When do you need a Self-Hosted Integration Runtime? A: When your source or destination is behind a firewall, in a private network, or on-premises. The Self-Hosted IR runs on a machine inside your network and creates an outbound connection to ADF, acting as a bridge between private resources and the cloud.

Q: Can a single pipeline use multiple Integration Runtimes? A: Yes. Different activities in the same pipeline can use different IRs. For example, a Copy activity might use Self-Hosted IR to read from on-premises SQL, while a Data Flow uses Azure IR for transformation.

Q: What is the difference between Auto-Resolve and a custom Azure IR? A: Auto-Resolve automatically selects the optimal region based on source and sink locations. A custom Azure IR lets you specify a fixed region, configure Data Flow cluster settings, and optionally enable Managed Virtual Network for private connectivity.

Q: How do you make Self-Hosted IR highly available? A: Install the IR on multiple machines (up to 4 nodes) using the same authentication key. If one node goes down, others continue processing. All nodes register with the same logical IR in ADF.

Q: What is TTL in Azure IR? A: Time to Live keeps the Spark cluster warm after a Data Flow execution. Setting TTL to 10 minutes means a second Data Flow within 10 minutes reuses the same cluster without the 3-5 minute cold start. Trade-off is cost — the cluster runs during the TTL period even if idle.

Q: Does Self-Hosted IR support Data Flows? A: No. Data Flows require a Spark cluster, which only Azure IR provides. To transform on-premises data, first copy it to ADLS Gen2 using Self-Hosted IR, then run a Data Flow using Azure IR.

Q: How does Azure-SSIS IR differ from the other two? A: Azure-SSIS IR provisions dedicated Windows VMs that run SSIS packages. It is designed for lift-and-shift migration of existing SSIS workloads. Unlike Azure IR (serverless) and Self-Hosted IR (your machine), Azure-SSIS IR has dedicated infrastructure costs that accrue even when idle.

Wrapping Up

Integration Runtime is the engine behind every ADF/Synapse pipeline. Choosing the right IR for each scenario ensures your pipelines are fast, secure, cost-effective, and able to reach all your data sources.

The decision framework:

  • Data is in Azure or public cloud? Use Azure IR (default)
  • Data is on-premises or in a private network? Use Self-Hosted IR
  • Running existing SSIS packages? Use Azure-SSIS IR
  • Need to process data in a specific region? Create a custom Azure IR
  • Need private Azure-to-Azure connectivity? Use Azure IR with Managed Virtual Network

Master IR, and you unlock the ability to build pipelines that reach any data source, anywhere.

Related posts:What is Azure Data Factory?ADF vs Synapse ComparisonMetadata-Driven Pipeline in ADFTop 15 ADF Interview QuestionsCommon ADF/Synapse Errors

If this guide helped you understand Integration Runtime, share it with your team. Questions? Drop a comment below.


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link