Fabric Data Factory: Activities, Pipelines, Dataflow Gen2, Notebooks, and Building Production ETL in Microsoft Fabric

You know Azure Data Factory. You have built metadata-driven pipelines, incremental loads, SCD pipelines, and CI/CD deployments. Now you open Fabric Data Factory and think: “This looks familiar… but different.”

It IS familiar — about 90% of what you know transfers directly. But Fabric Data Factory removes some complexity (no more datasets, no more linked services), adds new capabilities (Teams notifications, semantic model refresh, Dataflow Gen2 as a pipeline activity), and integrates with OneLake so tightly that connecting to storage is no longer a configuration exercise.

This post covers everything: every activity available, how to build and schedule pipelines, what is new compared to ADF/Synapse, and three complete pipeline examples including combining Dataflow Gen2 and notebooks inside pipelines. Think of it as the bridge between your ADF knowledge and Fabric.

Think of Fabric Data Factory like moving from a manual-transmission car (ADF) to an automatic (Fabric). The driving fundamentals are identical — steering, braking, accelerating. But the gear shifting (dataset configuration, linked service management, integration runtime setup) is now automatic. You focus on WHERE to drive (pipeline logic), not HOW the transmission works (infrastructure plumbing).

What Is Fabric Data Factory?
What Changed from ADF to Fabric Data Factory
No More Datasets (The Biggest Change)
No More Linked Services (Connections Instead)
Creating Your First Pipeline
All Pipeline Activities in Fabric Data Factory
Data Movement Activities
Transformation Activities
Control Flow Activities
Notification Activities (NEW in Fabric)
Fabric-Specific Activities (NEW)
Pipeline Parameters and Variables
Expressions and Dynamic Content
Scheduling Pipelines
Pipeline Example 1: Copy from SQL to Lakehouse
Pipeline Example 2: Metadata-Driven Multi-Table Load
Pipeline Example 3: Full ETL with Dataflow Gen2 + Notebook
Dataflow Gen2: What It Is and When to Use It
Dataflow Gen2 vs ADF Mapping Data Flows
Using Dataflow Gen2 Inside a Pipeline
Using Notebooks Inside a Pipeline
Combining Dataflow Gen2 + Notebook in One Pipeline
Monitoring Pipelines
Error Handling Patterns
Fabric Data Factory vs ADF Feature Mapping
When to Use Pipeline vs Dataflow vs Notebook
Common Mistakes
Interview Questions
Wrapping Up

What Is Fabric Data Factory?

Fabric Data Factory is the pipeline orchestration and data integration service inside Microsoft Fabric. It handles data movement (copying data from sources to destinations) and data orchestration (running activities in sequence, parallel, or conditionally).

Fabric Data Factory
  │
  ├── Pipelines — Orchestrate activities (Copy, ForEach, If, Notebook, Dataflow)
  │
  ├── Dataflow Gen2 — Visual no-code transformations (Power Query based)
  │
  └── Both write to OneLake natively (no linked service needed)

What Changed from ADF to Fabric Data Factory

Feature	ADF / Synapse	Fabric Data Factory
Datasets	Required (define table schema + connection)	Removed — defined inline in Copy activity
Linked Services	Required (connection strings, auth)	Replaced by Connections (simpler, reusable)
Mapping Data Flows	Visual Spark-based transformations	Replaced by Dataflow Gen2 (Power Query based)
Integration Runtime	Azure IR, SHIR, SSIS IR	Simplified — managed by Fabric capacity
Storage connection	Manual (access key, SAS, MI on ADLS)	Automatic for OneLake (zero config)
Notifications	External (Logic Apps, Azure Functions)	Built-in (Teams, Outlook activities)
Power BI refresh	External (REST API call)	Built-in (Semantic Model Refresh activity)
Monitoring	ADF Monitor hub (per factory)	Fabric Monitoring Hub (cross-workspace)
Billing	Per activity run + DIU hours	Included in Fabric capacity (CU based)
Git/CI/CD	ADF Git integration → ARM templates	Fabric deployment pipelines + Git
SSIS support	Azure-SSIS IR	Not available yet

No More Datasets (The Biggest Change)

In ADF, you needed to create a Dataset for every source and sink — defining the table, schema, format, and linked service. For 20 tables, that meant 40 datasets (20 source + 20 sink).

ADF (old way):
  Step 1: Create Linked Service → Azure SQL Database (connection string)
  Step 2: Create Dataset → DS_SQL_Customer (table=Customer, linked service=above)
  Step 3: Create Linked Service → ADLS Gen2 (access key)
  Step 4: Create Dataset → DS_ADLS_Customer (path=/bronze/customer/, format=parquet)
  Step 5: Create Copy Activity → source=DS_SQL_Customer, sink=DS_ADLS_Customer

  For 20 tables: 2 linked services + 40 datasets + 20 copy activities = 62 objects

Fabric (new way):
  Step 1: Create Connection → Azure SQL (once, reusable)
  Step 2: Create Copy Activity → source=SQL table (inline), sink=Lakehouse table (inline)

  For 20 tables: 1 connection + 20 copy activities = 21 objects
  No datasets at all. Table, schema, format defined INSIDE the Copy activity.

Real-life analogy: In ADF, ordering food required filling out a form for each dish (dataset): “Form #1: Dish=Pizza, Size=Large, Kitchen=Italian, Delivery=Table 5.” In Fabric, you just tell the waiter directly: “Large pizza to table 5.” Same outcome, less paperwork.

No More Linked Services (Connections Instead)

Linked Services are replaced by Connections — simpler, workspace-level, and reusable across all items:

ADF Linked Service (old):
  Name: LS_AzureSqlDatabase_Dev
  Type: Azure SQL Database
  Connection string: Server=tcp:server.database.windows.net,1433;Database=AdventureWorksLT;...
  Authentication: SQL Auth / Managed Identity
  Integration Runtime: AutoResolveIR

Fabric Connection (new):
  Name: SQL_AdventureWorks
  Type: Azure SQL Database
  Server: server.database.windows.net
  Database: AdventureWorksLT
  Auth: Organizational account / Service Principal
  (No integration runtime selection — managed by Fabric)

Connections are managed at the workspace level under Settings > Connections or created inline when you configure a Copy activity.

Creating Your First Pipeline

Step by Step

Open your Fabric workspace
Click + New item → Data pipeline
Name it: PL_Copy_Customers
The pipeline canvas opens (looks very similar to ADF)

The Canvas

┌─────────────────────────────────────────────────────────┐
│  Pipeline: PL_Copy_Customers                             │
│                                                          │
│  ┌──────────┐    ┌──────────┐    ┌──────────────────┐   │
│  │ Copy     │───>│ Notebook │───>│ Semantic Model   │   │
│  │ Activity │    │ Activity │    │ Refresh          │   │
│  └──────────┘    └──────────┘    └──────────────────┘   │
│                                                          │
│  Activities panel (left): Copy, ForEach, If, etc.        │
│  Properties panel (bottom): Source, Sink, Mapping        │
└─────────────────────────────────────────────────────────┘

All Pipeline Activities in Fabric Data Factory

Data Movement Activities

Activity	What It Does	ADF Equivalent
Copy Data	Move data from source to destination	Copy Activity (identical concept)

The Copy activity is the workhorse — same as ADF. Configure source (SQL, ADLS, REST API, files) and sink (Lakehouse, Warehouse, ADLS, SQL).

Transformation Activities

Activity	What It Does	ADF Equivalent
Dataflow Gen2	Visual Power Query transformations inside pipeline	Mapping Data Flow
Notebook	Run a Fabric Spark notebook	Databricks Notebook activity
Stored Procedure	Execute SQL stored procedure	Stored Procedure activity
Script	Run inline SQL script	Script activity
SQL Job Definition	Run a Spark SQL job	N/A (new)

Control Flow Activities

Activity	What It Does	ADF Equivalent
ForEach	Loop over a collection	ForEach (identical)
If Condition	Branch based on true/false expression	If Condition (identical)
Switch	Branch based on multiple values	Switch (identical)
Until	Loop until condition is true	Until (identical)
Wait	Pause for specified duration	Wait (identical)
Set Variable	Set a pipeline variable value	Set Variable (identical)
Append Variable	Add value to an array variable	Append Variable (identical)
Filter	Filter items in an array	Filter (identical)
Lookup	Query a data source and return results	Lookup (identical)
Get Metadata	Get file/folder metadata (size, count, exists)	Get Metadata (identical)
Fail	Intentionally fail the pipeline with a message	Fail (identical)
Execute Pipeline	Call another pipeline	Execute Pipeline (identical)
Web	Make HTTP REST API calls	Web activity (identical)
Webhook	Call a webhook and wait for callback	Webhook (identical)

Notification Activities (NEW in Fabric)

Activity	What It Does	ADF Equivalent
Office 365 Outlook	Send email from your Outlook account	Not in ADF — required Logic Apps
Teams	Post message to a Teams channel	Not in ADF — required webhooks

These are game-changers. In ADF, sending a pipeline failure email required a Logic App, a webhook, or a custom Azure Function. In Fabric, it is a drag-and-drop activity.

Fabric-Specific Activities (NEW)

Activity	What It Does	ADF Equivalent
Semantic Model Refresh	Trigger a Power BI semantic model refresh	Not in ADF — required REST API
KQL	Run a KQL query against an Eventhouse	N/A

Pipeline Parameters and Variables

Parameters (Input values — set when pipeline runs)

Pipeline Parameters:
  Name: source_table     Type: String    Default: SalesLT.Customer
  Name: target_folder    Type: String    Default: bronze/customers
  Name: load_type        Type: String    Default: FULL

Access in expressions: @pipeline().parameters.source_table

Variables (Internal values — change during execution)

Pipeline Variables:
  Name: row_count         Type: String    Default: 0
  Name: error_message     Type: String    Default: 
  Name: table_list        Type: Array     Default: []

Set with Set Variable activity: @activity('Lookup_Config').output.count

Expressions and Dynamic Content

Fabric uses the same expression language as ADF:

# Pipeline parameter
@pipeline().parameters.source_table

# Activity output
@activity('Lookup_Config').output.value
@activity('Copy_Data').output.rowsCopied

# Current item in ForEach
@item().TableName

# System variables
@pipeline().RunId
@pipeline().Pipeline
@utcNow()

# String functions
@concat('bronze/', pipeline().parameters.source_table, '/')
@replace(item().TableName, ' ', '_')
@toLower(item().SchemaName)

# Date functions
@formatDateTime(utcNow(), 'yyyy/MM/dd')
@adddays(utcNow(), -7)

# Conditional
@if(equals(item().LoadType, 'FULL'), 'Full Load', 'Incremental')

If you know ADF expressions, you know Fabric expressions — they are identical.

Scheduling Pipelines

Schedule Trigger

Open your pipeline
Click Schedule in the toolbar
Configure:
Start date and time: 2026-05-20 02:00 AM
Repeat: Every 1 day / Every 1 hour / Custom cron
Time zone: Eastern Standard Time
End date: Optional
Click Apply

Event-Based Trigger

Fabric supports file arrival triggers natively:

Pipeline settings → Add trigger
Type: File event
Configure: OneLake path, file pattern, debounce time

When a new file lands in the specified OneLake path, the pipeline runs automatically.

Pipeline Example 1: Copy from SQL to Lakehouse

The simplest pipeline — copy one table from Azure SQL to a Fabric Lakehouse:

Pipeline: PL_Copy_Customers
  │
  Copy Activity: Copy_Customers
    Source: Azure SQL Database → SalesLT.Customer (inline, no dataset)
    Sink: Lakehouse → Tables → customers (Delta format, auto)
    Mapping: Auto-map columns

Step by Step

Drag Copy Data activity onto the canvas
Source tab:
Connection: Select or create Azure SQL connection
Table: SalesLT.Customer (browse or type)
Destination tab:
Data store: Lakehouse (select your lakehouse)
Table: customers
Table action: Overwrite or Append
Mapping tab: Click Import schemas → auto-maps all columns
Click Run to test

That is it. No dataset to create. No linked service to configure. No integration runtime to select. The Copy activity defines everything inline.

Pipeline Example 2: Metadata-Driven Multi-Table Load

Our classic pattern — load multiple tables from a config table:

Pipeline: PL_Metadata_Load
  │
  Lookup: Lookup_Config
    Query: SELECT * FROM CONFIGTABLE_V2
    │
  ForEach: ForEach_Table
    Items: @activity('Lookup_Config').output.value
    │
    ├── Copy Activity: Copy_Table
    │     Source: Azure SQL → @item().SchemaName.@item().TableName
    │     Sink: Lakehouse → Tables → @item().FolderName
    │
    └── Notebook Activity: Log_Activity (optional)
          Notebook: /Notebooks/Log_Pipeline_Run
          Parameters: {"table": "@item().TableName", "rows": "@activity('Copy_Table').output.rowsCopied"}

The Key Difference from ADF

In ADF, you needed parameterized datasets: DS_SourceTable_Dynamic with @dataset().SchemaName and @dataset().TableName parameters. In Fabric, you configure the table dynamically INSIDE the Copy activity using expressions — no datasets needed.

Copy Activity Source Configuration (Dynamic)

Source:
  Connection: SQL_AdventureWorks
  Use query: Table
  Schema: @item().SchemaName          ← Dynamic from ForEach
  Table: @item().TableName            ← Dynamic from ForEach

Copy Activity Sink Configuration

Destination:
  Data store: Lakehouse
  Lakehouse: bronze_lakehouse
  Table: @item().TableName            ← Dynamic table name
  Table action: Overwrite

Pipeline Example 3: Full ETL with Dataflow Gen2 + Notebook

This is the production pattern — a complete Medallion pipeline:

Pipeline: PL_Daily_ETL
  │
  ├── Stage 1: INGEST (Copy Activities)
  │     Copy_Customers: SQL → Lakehouse bronze/customers
  │     Copy_Products: SQL → Lakehouse bronze/products
  │     Copy_Orders: SQL → Lakehouse bronze/orders
  │     (all 3 run in PARALLEL using ForEach with sequential=false)
  │
  ├── Stage 2: TRANSFORM (Dataflow Gen2)
  │     Dataflow_Bronze_to_Silver:
  │       Read bronze/customers → trim, initcap, dedup → write silver/customers
  │       Read bronze/products → filter, cast types → write silver/products
  │       Read bronze/orders → validate, fill nulls → write silver/orders
  │
  ├── Stage 3: ENRICH (Notebook)
  │     Notebook_Build_Gold:
  │       Read silver tables → SCD Type 2 MERGE → gold/dim_customer
  │       Read silver tables → build fact table → gold/fact_orders
  │       Read silver tables → aggregate → gold/agg_daily_revenue
  │
  ├── Stage 4: REFRESH (Semantic Model)
  │     Refresh_PowerBI_Model:
  │       Trigger semantic model refresh → Power BI Direct Lake updates
  │
  └── Stage 5: NOTIFY
        ├── (Success) → Teams: "Daily ETL completed. X rows processed."
        └── (Failure) → Outlook: "ETL FAILED. Check pipeline run ID: @pipeline().RunId"

The DAG

Copy_Customers ──┐
Copy_Products  ──┼──► Dataflow_Bronze_to_Silver ──► Notebook_Build_Gold ──► Refresh_PowerBI
Copy_Orders   ───┘                                                              │
                                                                          ┌─────┴─────┐
                                                                     (Success)    (Failure)
                                                                     Teams msg    Outlook email

Dataflow Gen2: What It Is and When to Use It

Dataflow Gen2 is the no-code visual transformation tool in Fabric, built on Power Query (the same engine used in Power BI and Excel). You connect to data, apply transformations visually (click, not code), and write results to a Fabric destination.

Dataflow Gen2 Canvas:
  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────────┐
  │  Source   │───>│ Clean    │───>│ Merge    │───>│ Destination  │
  │ (SQL,CSV) │    │ (trim,   │    │ (join    │    │ (Lakehouse,  │
  │           │    │  filter) │    │  tables) │    │  Warehouse)  │
  └──────────┘    └──────────┘    └──────────┘    └──────────────┘

Dataflow Gen2 Supported Destinations

Fabric Lakehouse (Delta tables)
Fabric Warehouse
Azure SQL Database
Azure Data Explorer (KQL)
Fabric Mirrored Database

When to Use Dataflow Gen2

Simple to medium transformations (filter, rename, merge, pivot)
Business users or analysts building their own ETL
Quick data cleaning without writing PySpark code
Power Query-familiar teams

When NOT to Use Dataflow Gen2

Complex transformations (window functions, UDFs, MERGE/SCD)
Large-scale processing (billions of rows — use Spark notebook)
Custom Python libraries needed
Advanced Delta Lake operations (OPTIMIZE, VACUUM, schema evolution)

Dataflow Gen2 vs ADF Mapping Data Flows

Feature	ADF Mapping Data Flows	Fabric Dataflow Gen2
Engine	Spark (behind the scenes)	Power Query (M language)
UI	ADF Studio (visual Spark)	Power Query Editor (Excel-like)
Learning curve	Moderate (Spark concepts)	Low (Excel/Power BI users know it)
Debug	Debug cluster required (slow startup)	Instant preview (no cluster)
Performance	High (Spark)	Medium (optimized for medium data)
Destinations	ADLS, SQL, Cosmos DB	Lakehouse, Warehouse, SQL
CI/CD	ARM templates	Fabric deployment pipelines + Git
Cost	Separate billing (Data Flow hours)	Included in Fabric capacity (CU)
Availability	ADF only	Fabric only

Using Dataflow Gen2 Inside a Pipeline

In your pipeline canvas, drag Dataflow activity
Select an existing Dataflow Gen2 or create a new one
Configure Parameters (optional — pass pipeline values to the dataflow)
Connect with arrows for sequencing (runs after previous activity succeeds)

Pipeline: PL_Daily_ETL
  │
  Copy Activity: Copy_Raw_Data
    │
  Dataflow Activity: DF_Clean_and_Transform
    Dataflow: Clean_Customer_Data (your Dataflow Gen2 item)
    Parameters: {"load_date": "@formatDateTime(utcNow(), 'yyyy-MM-dd')"}
    │
  Notebook Activity: Build_Gold_Tables

Using Notebooks Inside a Pipeline

Drag Notebook activity onto the canvas
Select the notebook from your workspace
Configure Base parameters (passed as dbutils.widgets values):

{
    "source_table": "customers",
    "target_path": "gold/dim_customer",
    "run_date": "@formatDateTime(utcNow(), 'yyyy-MM-dd')"
}

The notebook reads parameters with:

source = dbutils.widgets.get("source_table")       # "customers"
target = dbutils.widgets.get("target_path")         # "gold/dim_customer"
run_date = dbutils.widgets.get("run_date")          # "2026-05-25"

Notebook Output

Notebooks can return values to the pipeline using the notebook exit value:

# At the end of the notebook
import json
result = {"rows_processed": df.count(), "status": "SUCCESS"}
dbutils.notebook.exit(json.dumps(result))

Pipeline reads it with: @activity('Notebook_Activity').output.result.exitValue

Combining Dataflow Gen2 + Notebook in One Pipeline

This is the recommended production pattern — Dataflow for simple cleaning, Notebook for complex logic:

Pipeline: PL_Complete_Medallion

  Stage 1 — INGEST (Copy Activities):
    ForEach → Copy from SQL sources to bronze lakehouse
    (parallel, fast, raw data)

  Stage 2 — CLEAN (Dataflow Gen2):
    DF_Clean_Customers:
      Source: bronze/customers
      Steps: trim names, lowercase emails, remove nulls, cast dates
      Destination: silver/customers (lakehouse table)

    DF_Clean_Products:
      Source: bronze/products
      Steps: standardize categories, fill missing prices, dedup
      Destination: silver/products

  Stage 3 — ENRICH (Notebook):
    NB_Build_Dimensions:
      Read silver/customers → SCD Type 2 MERGE → gold/dim_customer
      Read silver/products → SCD Type 1 MERGE → gold/dim_product
      (Complex Delta MERGE logic that Dataflow cannot do)

    NB_Build_Facts:
      Read silver/orders + dim tables → build fact_orders
      Aggregate → agg_daily_revenue
      (Window functions, complex joins)

  Stage 4 — SERVE (Semantic Model Refresh):
    Refresh Power BI semantic model (Direct Lake)

  Stage 5 — NOTIFY:
    Success → Teams channel message
    Failure → Email via Outlook activity

Why This Pattern Works

Layer	Tool	Why
Ingest	Copy Activity	Fastest way to move data. Zero transformation.
Clean	Dataflow Gen2	Simple transforms (trim, filter, dedup). Visual. Business users can maintain.
Enrich	Notebook	Complex logic (SCD MERGE, window functions, custom Python). Full PySpark power.
Serve	Semantic Model Refresh	One activity triggers Power BI Direct Lake update.
Notify	Teams / Outlook	Built-in. No Logic Apps needed.

The rule: Use Dataflow Gen2 for what Power Query does well (simple cleaning). Use Notebooks for what PySpark does well (complex transformations). Never force complex logic into Dataflow. Never over-engineer simple cleaning with Spark.

Monitoring Pipelines

Fabric Monitoring Hub

Click Monitor in the left sidebar
See ALL pipeline runs across ALL workspaces (unlike ADF which is per factory)
Filter by: status, date, pipeline name, workspace

Pipeline Run Details

Click a specific run to see: – Each activity’s status (green/red) – Duration per activity – Rows read and written (for Copy activities) – Error messages (for failed activities) – Input and output for each activity

Alerting

Configure alerts in workspace settings or use the Teams/Outlook activities within the pipeline itself for real-time notifications.

Error Handling Patterns

Pattern 1: Activity-Level Retry

Copy Activity Settings:
  Retry: 3
  Retry interval: 30 seconds

Pattern 2: Red Arrow (On Failure)

Copy_Data ──(green)──► Log_Success
    │
    └──(red)──► Log_Failure ──► Send_Alert_Email

Pattern 3: Try-Catch with Set Variable

Copy_Data ──(green)──► Set_Variable: status = "SUCCESS"
    │
    └──(red)──► Set_Variable: status = "FAILED"
                    │
                    └──► Set_Variable: error = @activity('Copy_Data').output.errors[0].message

If Condition: @equals(variables('status'), 'FAILED')
  True → Send failure alert
  False → Continue pipeline

Fabric Data Factory vs ADF Feature Mapping

ADF Feature	Fabric Equivalent	Key Difference
Linked Service	Connection	Workspace-level, shareable across items
Dataset	Not needed	Fabric connects directly — no dataset abstraction
Integration Runtime (Azure)	Not needed	Fabric manages compute automatically
Integration Runtime (Self-Hosted)	On-Premises Data Gateway	Same concept, different name
Copy Activity	Copy Activity	Nearly identical syntax and configuration
Data Flow	Dataflow Gen2	Power Query engine replaces Spark-based Data Flows
Mapping Data Flow	Dataflow Gen2 + Notebooks	Complex transforms moved to notebooks
Pipeline Expressions	Pipeline Expressions	Identical syntax (@pipeline, @activity, @formatDateTime)
Triggers (Schedule/Event/Tumbling)	Same trigger types	Identical concept and configuration
Pipeline Parameters & Variables	Pipeline Parameters & Variables	Identical
ForEach / If Condition / Lookup	ForEach / If Condition / Lookup	Identical
Web Activity	Web Activity	Identical
Stored Procedure Activity	Stored Procedure Activity	Identical
Execute Pipeline	Invoke Pipeline	Same concept, slightly different name
Notebook Activity	Notebook Activity	Fabric notebooks (not Databricks)
ARM Template Deployment	Deployment Pipelines + Git	Git + deployment pipelines instead of ARM
Monitor Hub	Monitoring Hub	Same concept — centralized run monitoring
Alerts (Teams/Email)	Teams/Outlook + Data Activator	Data Activator adds data-level alerts

The bottom line: If you know ADF, you know 80% of Fabric Data Factory. Copy Activity, ForEach, If Condition, Lookup, expressions — all identical. The main differences: Linked Services became Connections, Data Flows became Dataflow Gen2, ARM deployments became Git + Deployment Pipelines, and Fabric adds Notebook Activities natively.

When to Use Pipeline vs Dataflow vs Notebook

Scenario	Best Tool	Why
Move data from SQL to Lakehouse	Pipeline (Copy)	Fastest data movement, zero transformation
Simple cleaning (trim, filter, dedup)	Dataflow Gen2	Visual, no code, business users can maintain
Complex transforms (SCD, MERGE, windows)	Notebook	Full PySpark/SQL power
Orchestrate multiple steps	Pipeline	ForEach, If Condition, sequencing
Schedule everything	Pipeline	Built-in scheduler
Ad-hoc data exploration	Notebook	Interactive, cell-by-cell
Power BI refresh after ETL	Pipeline	Semantic Model Refresh activity
Alert on failure	Pipeline	Teams/Outlook activity

Common Mistakes

Trying to do complex transforms in Dataflow Gen2 — SCD MERGE, window functions, and Delta operations belong in Notebooks. Dataflow Gen2 is for simple cleaning.
Creating ADF-style datasets in Fabric — datasets do not exist in Fabric. Define source/sink inline in the Copy activity.
Not using the Teams/Outlook activities — in ADF, email notifications required Logic Apps. In Fabric, drag-and-drop. Use them.
Running Dataflow Gen2 on huge datasets — Dataflow Gen2 is optimized for medium data (millions of rows). For billions, use a Spark notebook.
Forgetting to add error handling — every Copy activity should have a red arrow path to a failure handler. Silent failures are production nightmares.
Not parameterizing pipelines — hardcoded table names, paths, and dates make pipelines single-use. Parameterize everything.

Interview Questions

Q: What is the difference between Fabric Data Factory and Azure Data Factory? A: Fabric Data Factory is the SaaS version inside Microsoft Fabric. Key differences: no datasets (inline configuration), connections instead of linked services, Dataflow Gen2 instead of Mapping Data Flows, built-in Teams and Outlook notification activities, semantic model refresh activity, and billing included in Fabric capacity. About 90% of ADF activities are available, with the notable exception of SSIS support.

Q: What is Dataflow Gen2 and when should you use it? A: Dataflow Gen2 is the visual, no-code transformation tool built on Power Query. Use it for simple to medium transformations like trimming, filtering, deduplication, and type casting. Do not use it for complex operations like SCD MERGE, window functions, or Delta Lake operations — use Spark notebooks for those.

Q: How do you combine Dataflow Gen2 and Notebooks in a pipeline? A: Use Dataflow Gen2 for Bronze-to-Silver cleaning (simple transforms), then Notebook for Silver-to-Gold enrichment (complex MERGE, aggregations). The pipeline sequences them: Copy → Dataflow Gen2 → Notebook → Semantic Model Refresh → Teams notification. Each tool handles what it does best.

Q: What notification options are available in Fabric Data Factory? A: Fabric has built-in Teams and Outlook activities — drag and drop them into your pipeline. In ADF, notifications required external services like Logic Apps or Azure Functions. This is one of Fabric’s key improvements over ADF.

Q: Why did Fabric remove Datasets? A: Datasets in ADF were an extra layer of configuration that added complexity without much value. In Fabric, source and sink properties (table name, schema, format) are defined inline within the Copy activity itself. This reduces the number of objects to manage and simplifies pipeline design.

Wrapping Up

Fabric Data Factory is ADF evolved — same concepts, less plumbing, more built-in capabilities. The pipeline canvas looks familiar. The expressions are identical. The control flow activities are the same. What changed is the removal of unnecessary complexity (datasets, linked services) and the addition of Fabric-native capabilities (Teams notifications, semantic model refresh, Dataflow Gen2 as a pipeline activity).

The production pattern is clear: Copy for ingestion, Dataflow Gen2 for simple cleaning, Notebooks for complex transformations, Semantic Model Refresh for Power BI, and Teams/Outlook for notifications. One pipeline, five stages, zero external services.

← Previous: Warehouse Advanced

Fabric (10/34)

Next: Data Factory Expression Language →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

← Warehouse Advanced Fabric Copy Activity Deep Dive →

Fabric Data Factory: Activities, Pipelines, Dataflow Gen2, Notebooks, and Building Production ETL in Microsoft Fabric

Table of Contents

What Is Fabric Data Factory?

What Changed from ADF to Fabric Data Factory

No More Datasets (The Biggest Change)

No More Linked Services (Connections Instead)

Creating Your First Pipeline

Step by Step

The Canvas

All Pipeline Activities in Fabric Data Factory

Data Movement Activities

Transformation Activities

Control Flow Activities

Notification Activities (NEW in Fabric)

Fabric-Specific Activities (NEW)

Pipeline Parameters and Variables

Parameters (Input values — set when pipeline runs)

Variables (Internal values — change during execution)

Expressions and Dynamic Content

Scheduling Pipelines

Schedule Trigger

Event-Based Trigger

Pipeline Example 1: Copy from SQL to Lakehouse

Step by Step

Pipeline Example 2: Metadata-Driven Multi-Table Load

The Key Difference from ADF

Copy Activity Source Configuration (Dynamic)

Copy Activity Sink Configuration

Pipeline Example 3: Full ETL with Dataflow Gen2 + Notebook

The DAG

Dataflow Gen2: What It Is and When to Use It

Dataflow Gen2 Supported Destinations

When to Use Dataflow Gen2

When NOT to Use Dataflow Gen2

Dataflow Gen2 vs ADF Mapping Data Flows

Using Dataflow Gen2 Inside a Pipeline

Using Notebooks Inside a Pipeline

Notebook Output

Combining Dataflow Gen2 + Notebook in One Pipeline

Why This Pattern Works

Monitoring Pipelines

Fabric Monitoring Hub

Pipeline Run Details

Alerting

Error Handling Patterns

Pattern 1: Activity-Level Retry

Pattern 2: Red Arrow (On Failure)

Pattern 3: Try-Catch with Set Variable

Fabric Data Factory vs ADF Feature Mapping

When to Use Pipeline vs Dataflow vs Notebook

Common Mistakes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply