Fabric Data Factory: Activities, Pipelines, Dataflow Gen2, Notebooks, and Building Production ETL in Microsoft Fabric
You know Azure Data Factory. You have built metadata-driven pipelines, incremental loads, SCD pipelines, and CI/CD deployments. Now you open Fabric Data Factory and think: “This looks familiar… but different.”
It IS familiar — about 90% of what you know transfers directly. But Fabric Data Factory removes some complexity (no more datasets, no more linked services), adds new capabilities (Teams notifications, semantic model refresh, Dataflow Gen2 as a pipeline activity), and integrates with OneLake so tightly that connecting to storage is no longer a configuration exercise.
This post covers everything: every activity available, how to build and schedule pipelines, what is new compared to ADF/Synapse, and three complete pipeline examples including combining Dataflow Gen2 and notebooks inside pipelines. Think of it as the bridge between your ADF knowledge and Fabric.
Think of Fabric Data Factory like moving from a manual-transmission car (ADF) to an automatic (Fabric). The driving fundamentals are identical — steering, braking, accelerating. But the gear shifting (dataset configuration, linked service management, integration runtime setup) is now automatic. You focus on WHERE to drive (pipeline logic), not HOW the transmission works (infrastructure plumbing).
Table of Contents
- What Is Fabric Data Factory?
- What Changed from ADF to Fabric Data Factory
- No More Datasets (The Biggest Change)
- No More Linked Services (Connections Instead)
- Creating Your First Pipeline
- All Pipeline Activities in Fabric Data Factory
- Data Movement Activities
- Transformation Activities
- Control Flow Activities
- Notification Activities (NEW in Fabric)
- Fabric-Specific Activities (NEW)
- Pipeline Parameters and Variables
- Expressions and Dynamic Content
- Scheduling Pipelines
- Pipeline Example 1: Copy from SQL to Lakehouse
- Pipeline Example 2: Metadata-Driven Multi-Table Load
- Pipeline Example 3: Full ETL with Dataflow Gen2 + Notebook
- Dataflow Gen2: What It Is and When to Use It
- Dataflow Gen2 vs ADF Mapping Data Flows
- Using Dataflow Gen2 Inside a Pipeline
- Using Notebooks Inside a Pipeline
- Combining Dataflow Gen2 + Notebook in One Pipeline
- Monitoring Pipelines
- Error Handling Patterns
- Fabric Data Factory vs ADF Feature Mapping
- When to Use Pipeline vs Dataflow vs Notebook
- Common Mistakes
- Interview Questions
- Wrapping Up
What Is Fabric Data Factory?
Fabric Data Factory is the pipeline orchestration and data integration service inside Microsoft Fabric. It handles data movement (copying data from sources to destinations) and data orchestration (running activities in sequence, parallel, or conditionally).
Fabric Data Factory
│
├── Pipelines — Orchestrate activities (Copy, ForEach, If, Notebook, Dataflow)
│
├── Dataflow Gen2 — Visual no-code transformations (Power Query based)
│
└── Both write to OneLake natively (no linked service needed)
What Changed from ADF to Fabric Data Factory
| Feature | ADF / Synapse | Fabric Data Factory |
|---|---|---|
| Datasets | Required (define table schema + connection) | Removed — defined inline in Copy activity |
| Linked Services | Required (connection strings, auth) | Replaced by Connections (simpler, reusable) |
| Mapping Data Flows | Visual Spark-based transformations | Replaced by Dataflow Gen2 (Power Query based) |
| Integration Runtime | Azure IR, SHIR, SSIS IR | Simplified — managed by Fabric capacity |
| Storage connection | Manual (access key, SAS, MI on ADLS) | Automatic for OneLake (zero config) |
| Notifications | External (Logic Apps, Azure Functions) | Built-in (Teams, Outlook activities) |
| Power BI refresh | External (REST API call) | Built-in (Semantic Model Refresh activity) |
| Monitoring | ADF Monitor hub (per factory) | Fabric Monitoring Hub (cross-workspace) |
| Billing | Per activity run + DIU hours | Included in Fabric capacity (CU based) |
| Git/CI/CD | ADF Git integration → ARM templates | Fabric deployment pipelines + Git |
| SSIS support | Azure-SSIS IR | Not available yet |
No More Datasets (The Biggest Change)
In ADF, you needed to create a Dataset for every source and sink — defining the table, schema, format, and linked service. For 20 tables, that meant 40 datasets (20 source + 20 sink).
ADF (old way):
Step 1: Create Linked Service → Azure SQL Database (connection string)
Step 2: Create Dataset → DS_SQL_Customer (table=Customer, linked service=above)
Step 3: Create Linked Service → ADLS Gen2 (access key)
Step 4: Create Dataset → DS_ADLS_Customer (path=/bronze/customer/, format=parquet)
Step 5: Create Copy Activity → source=DS_SQL_Customer, sink=DS_ADLS_Customer
For 20 tables: 2 linked services + 40 datasets + 20 copy activities = 62 objects
Fabric (new way):
Step 1: Create Connection → Azure SQL (once, reusable)
Step 2: Create Copy Activity → source=SQL table (inline), sink=Lakehouse table (inline)
For 20 tables: 1 connection + 20 copy activities = 21 objects
No datasets at all. Table, schema, format defined INSIDE the Copy activity.
Real-life analogy: In ADF, ordering food required filling out a form for each dish (dataset): “Form #1: Dish=Pizza, Size=Large, Kitchen=Italian, Delivery=Table 5.” In Fabric, you just tell the waiter directly: “Large pizza to table 5.” Same outcome, less paperwork.
No More Linked Services (Connections Instead)
Linked Services are replaced by Connections — simpler, workspace-level, and reusable across all items:
ADF Linked Service (old):
Name: LS_AzureSqlDatabase_Dev
Type: Azure SQL Database
Connection string: Server=tcp:server.database.windows.net,1433;Database=AdventureWorksLT;...
Authentication: SQL Auth / Managed Identity
Integration Runtime: AutoResolveIR
Fabric Connection (new):
Name: SQL_AdventureWorks
Type: Azure SQL Database
Server: server.database.windows.net
Database: AdventureWorksLT
Auth: Organizational account / Service Principal
(No integration runtime selection — managed by Fabric)
Connections are managed at the workspace level under Settings > Connections or created inline when you configure a Copy activity.
Creating Your First Pipeline
Step by Step
- Open your Fabric workspace
- Click + New item → Data pipeline
- Name it:
PL_Copy_Customers - The pipeline canvas opens (looks very similar to ADF)
The Canvas
┌─────────────────────────────────────────────────────────┐
│ Pipeline: PL_Copy_Customers │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Copy │───>│ Notebook │───>│ Semantic Model │ │
│ │ Activity │ │ Activity │ │ Refresh │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
│ │
│ Activities panel (left): Copy, ForEach, If, etc. │
│ Properties panel (bottom): Source, Sink, Mapping │
└─────────────────────────────────────────────────────────┘
All Pipeline Activities in Fabric Data Factory
Data Movement Activities
| Activity | What It Does | ADF Equivalent |
|---|---|---|
| Copy Data | Move data from source to destination | Copy Activity (identical concept) |
The Copy activity is the workhorse — same as ADF. Configure source (SQL, ADLS, REST API, files) and sink (Lakehouse, Warehouse, ADLS, SQL).
Transformation Activities
| Activity | What It Does | ADF Equivalent |
|---|---|---|
| Dataflow Gen2 | Visual Power Query transformations inside pipeline | Mapping Data Flow |
| Notebook | Run a Fabric Spark notebook | Databricks Notebook activity |
| Stored Procedure | Execute SQL stored procedure | Stored Procedure activity |
| Script | Run inline SQL script | Script activity |
| SQL Job Definition | Run a Spark SQL job | N/A (new) |
Control Flow Activities
| Activity | What It Does | ADF Equivalent |
|---|---|---|
| ForEach | Loop over a collection | ForEach (identical) |
| If Condition | Branch based on true/false expression | If Condition (identical) |
| Switch | Branch based on multiple values | Switch (identical) |
| Until | Loop until condition is true | Until (identical) |
| Wait | Pause for specified duration | Wait (identical) |
| Set Variable | Set a pipeline variable value | Set Variable (identical) |
| Append Variable | Add value to an array variable | Append Variable (identical) |
| Filter | Filter items in an array | Filter (identical) |
| Lookup | Query a data source and return results | Lookup (identical) |
| Get Metadata | Get file/folder metadata (size, count, exists) | Get Metadata (identical) |
| Fail | Intentionally fail the pipeline with a message | Fail (identical) |
| Execute Pipeline | Call another pipeline | Execute Pipeline (identical) |
| Web | Make HTTP REST API calls | Web activity (identical) |
| Webhook | Call a webhook and wait for callback | Webhook (identical) |
Notification Activities (NEW in Fabric)
| Activity | What It Does | ADF Equivalent |
|---|---|---|
| Office 365 Outlook | Send email from your Outlook account | Not in ADF — required Logic Apps |
| Teams | Post message to a Teams channel | Not in ADF — required webhooks |
These are game-changers. In ADF, sending a pipeline failure email required a Logic App, a webhook, or a custom Azure Function. In Fabric, it is a drag-and-drop activity.
Fabric-Specific Activities (NEW)
| Activity | What It Does | ADF Equivalent |
|---|---|---|
| Semantic Model Refresh | Trigger a Power BI semantic model refresh | Not in ADF — required REST API |
| KQL | Run a KQL query against an Eventhouse | N/A |
Pipeline Parameters and Variables
Parameters (Input values — set when pipeline runs)
Pipeline Parameters:
Name: source_table Type: String Default: SalesLT.Customer
Name: target_folder Type: String Default: bronze/customers
Name: load_type Type: String Default: FULL
Access in expressions: @pipeline().parameters.source_table
Variables (Internal values — change during execution)
Pipeline Variables:
Name: row_count Type: String Default: 0
Name: error_message Type: String Default:
Name: table_list Type: Array Default: []
Set with Set Variable activity: @activity('Lookup_Config').output.count
Expressions and Dynamic Content
Fabric uses the same expression language as ADF:
# Pipeline parameter
@pipeline().parameters.source_table
# Activity output
@activity('Lookup_Config').output.value
@activity('Copy_Data').output.rowsCopied
# Current item in ForEach
@item().TableName
# System variables
@pipeline().RunId
@pipeline().Pipeline
@utcNow()
# String functions
@concat('bronze/', pipeline().parameters.source_table, '/')
@replace(item().TableName, ' ', '_')
@toLower(item().SchemaName)
# Date functions
@formatDateTime(utcNow(), 'yyyy/MM/dd')
@adddays(utcNow(), -7)
# Conditional
@if(equals(item().LoadType, 'FULL'), 'Full Load', 'Incremental')
If you know ADF expressions, you know Fabric expressions — they are identical.
Scheduling Pipelines
Schedule Trigger
- Open your pipeline
- Click Schedule in the toolbar
- Configure:
- Start date and time: 2026-05-20 02:00 AM
- Repeat: Every 1 day / Every 1 hour / Custom cron
- Time zone: Eastern Standard Time
- End date: Optional
- Click Apply
Event-Based Trigger
Fabric supports file arrival triggers natively:
- Pipeline settings → Add trigger
- Type: File event
- Configure: OneLake path, file pattern, debounce time
When a new file lands in the specified OneLake path, the pipeline runs automatically.
Pipeline Example 1: Copy from SQL to Lakehouse
The simplest pipeline — copy one table from Azure SQL to a Fabric Lakehouse:
Pipeline: PL_Copy_Customers
│
Copy Activity: Copy_Customers
Source: Azure SQL Database → SalesLT.Customer (inline, no dataset)
Sink: Lakehouse → Tables → customers (Delta format, auto)
Mapping: Auto-map columns
Step by Step
- Drag Copy Data activity onto the canvas
- Source tab:
- Connection: Select or create Azure SQL connection
- Table:
SalesLT.Customer(browse or type) - Destination tab:
- Data store: Lakehouse (select your lakehouse)
- Table:
customers - Table action: Overwrite or Append
- Mapping tab: Click Import schemas → auto-maps all columns
- Click Run to test
That is it. No dataset to create. No linked service to configure. No integration runtime to select. The Copy activity defines everything inline.
Pipeline Example 2: Metadata-Driven Multi-Table Load
Our classic pattern — load multiple tables from a config table:
Pipeline: PL_Metadata_Load
│
Lookup: Lookup_Config
Query: SELECT * FROM CONFIGTABLE_V2
│
ForEach: ForEach_Table
Items: @activity('Lookup_Config').output.value
│
├── Copy Activity: Copy_Table
│ Source: Azure SQL → @item().SchemaName.@item().TableName
│ Sink: Lakehouse → Tables → @item().FolderName
│
└── Notebook Activity: Log_Activity (optional)
Notebook: /Notebooks/Log_Pipeline_Run
Parameters: {"table": "@item().TableName", "rows": "@activity('Copy_Table').output.rowsCopied"}
The Key Difference from ADF
In ADF, you needed parameterized datasets: DS_SourceTable_Dynamic with @dataset().SchemaName and @dataset().TableName parameters. In Fabric, you configure the table dynamically INSIDE the Copy activity using expressions — no datasets needed.
Copy Activity Source Configuration (Dynamic)
Source:
Connection: SQL_AdventureWorks
Use query: Table
Schema: @item().SchemaName ← Dynamic from ForEach
Table: @item().TableName ← Dynamic from ForEach
Copy Activity Sink Configuration
Destination:
Data store: Lakehouse
Lakehouse: bronze_lakehouse
Table: @item().TableName ← Dynamic table name
Table action: Overwrite
Pipeline Example 3: Full ETL with Dataflow Gen2 + Notebook
This is the production pattern — a complete Medallion pipeline:
Pipeline: PL_Daily_ETL
│
├── Stage 1: INGEST (Copy Activities)
│ Copy_Customers: SQL → Lakehouse bronze/customers
│ Copy_Products: SQL → Lakehouse bronze/products
│ Copy_Orders: SQL → Lakehouse bronze/orders
│ (all 3 run in PARALLEL using ForEach with sequential=false)
│
├── Stage 2: TRANSFORM (Dataflow Gen2)
│ Dataflow_Bronze_to_Silver:
│ Read bronze/customers → trim, initcap, dedup → write silver/customers
│ Read bronze/products → filter, cast types → write silver/products
│ Read bronze/orders → validate, fill nulls → write silver/orders
│
├── Stage 3: ENRICH (Notebook)
│ Notebook_Build_Gold:
│ Read silver tables → SCD Type 2 MERGE → gold/dim_customer
│ Read silver tables → build fact table → gold/fact_orders
│ Read silver tables → aggregate → gold/agg_daily_revenue
│
├── Stage 4: REFRESH (Semantic Model)
│ Refresh_PowerBI_Model:
│ Trigger semantic model refresh → Power BI Direct Lake updates
│
└── Stage 5: NOTIFY
├── (Success) → Teams: "Daily ETL completed. X rows processed."
└── (Failure) → Outlook: "ETL FAILED. Check pipeline run ID: @pipeline().RunId"
The DAG
Copy_Customers ──┐
Copy_Products ──┼──► Dataflow_Bronze_to_Silver ──► Notebook_Build_Gold ──► Refresh_PowerBI
Copy_Orders ───┘ │
┌─────┴─────┐
(Success) (Failure)
Teams msg Outlook email
Dataflow Gen2: What It Is and When to Use It
Dataflow Gen2 is the no-code visual transformation tool in Fabric, built on Power Query (the same engine used in Power BI and Excel). You connect to data, apply transformations visually (click, not code), and write results to a Fabric destination.
Dataflow Gen2 Canvas:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐
│ Source │───>│ Clean │───>│ Merge │───>│ Destination │
│ (SQL,CSV) │ │ (trim, │ │ (join │ │ (Lakehouse, │
│ │ │ filter) │ │ tables) │ │ Warehouse) │
└──────────┘ └──────────┘ └──────────┘ └──────────────┘
Dataflow Gen2 Supported Destinations
- Fabric Lakehouse (Delta tables)
- Fabric Warehouse
- Azure SQL Database
- Azure Data Explorer (KQL)
- Fabric Mirrored Database
When to Use Dataflow Gen2
- Simple to medium transformations (filter, rename, merge, pivot)
- Business users or analysts building their own ETL
- Quick data cleaning without writing PySpark code
- Power Query-familiar teams
When NOT to Use Dataflow Gen2
- Complex transformations (window functions, UDFs, MERGE/SCD)
- Large-scale processing (billions of rows — use Spark notebook)
- Custom Python libraries needed
- Advanced Delta Lake operations (OPTIMIZE, VACUUM, schema evolution)
Dataflow Gen2 vs ADF Mapping Data Flows
| Feature | ADF Mapping Data Flows | Fabric Dataflow Gen2 |
|---|---|---|
| Engine | Spark (behind the scenes) | Power Query (M language) |
| UI | ADF Studio (visual Spark) | Power Query Editor (Excel-like) |
| Learning curve | Moderate (Spark concepts) | Low (Excel/Power BI users know it) |
| Debug | Debug cluster required (slow startup) | Instant preview (no cluster) |
| Performance | High (Spark) | Medium (optimized for medium data) |
| Destinations | ADLS, SQL, Cosmos DB | Lakehouse, Warehouse, SQL |
| CI/CD | ARM templates | Fabric deployment pipelines + Git |
| Cost | Separate billing (Data Flow hours) | Included in Fabric capacity (CU) |
| Availability | ADF only | Fabric only |
Using Dataflow Gen2 Inside a Pipeline
- In your pipeline canvas, drag Dataflow activity
- Select an existing Dataflow Gen2 or create a new one
- Configure Parameters (optional — pass pipeline values to the dataflow)
- Connect with arrows for sequencing (runs after previous activity succeeds)
Pipeline: PL_Daily_ETL
│
Copy Activity: Copy_Raw_Data
│
Dataflow Activity: DF_Clean_and_Transform
Dataflow: Clean_Customer_Data (your Dataflow Gen2 item)
Parameters: {"load_date": "@formatDateTime(utcNow(), 'yyyy-MM-dd')"}
│
Notebook Activity: Build_Gold_Tables
Using Notebooks Inside a Pipeline
- Drag Notebook activity onto the canvas
- Select the notebook from your workspace
- Configure Base parameters (passed as
dbutils.widgetsvalues):
{
"source_table": "customers",
"target_path": "gold/dim_customer",
"run_date": "@formatDateTime(utcNow(), 'yyyy-MM-dd')"
}
The notebook reads parameters with:
source = dbutils.widgets.get("source_table") # "customers"
target = dbutils.widgets.get("target_path") # "gold/dim_customer"
run_date = dbutils.widgets.get("run_date") # "2026-05-25"
Notebook Output
Notebooks can return values to the pipeline using the notebook exit value:
# At the end of the notebook
import json
result = {"rows_processed": df.count(), "status": "SUCCESS"}
dbutils.notebook.exit(json.dumps(result))
Pipeline reads it with: @activity('Notebook_Activity').output.result.exitValue
Combining Dataflow Gen2 + Notebook in One Pipeline
This is the recommended production pattern — Dataflow for simple cleaning, Notebook for complex logic:
Pipeline: PL_Complete_Medallion
Stage 1 — INGEST (Copy Activities):
ForEach → Copy from SQL sources to bronze lakehouse
(parallel, fast, raw data)
Stage 2 — CLEAN (Dataflow Gen2):
DF_Clean_Customers:
Source: bronze/customers
Steps: trim names, lowercase emails, remove nulls, cast dates
Destination: silver/customers (lakehouse table)
DF_Clean_Products:
Source: bronze/products
Steps: standardize categories, fill missing prices, dedup
Destination: silver/products
Stage 3 — ENRICH (Notebook):
NB_Build_Dimensions:
Read silver/customers → SCD Type 2 MERGE → gold/dim_customer
Read silver/products → SCD Type 1 MERGE → gold/dim_product
(Complex Delta MERGE logic that Dataflow cannot do)
NB_Build_Facts:
Read silver/orders + dim tables → build fact_orders
Aggregate → agg_daily_revenue
(Window functions, complex joins)
Stage 4 — SERVE (Semantic Model Refresh):
Refresh Power BI semantic model (Direct Lake)
Stage 5 — NOTIFY:
Success → Teams channel message
Failure → Email via Outlook activity
Why This Pattern Works
| Layer | Tool | Why |
|---|---|---|
| Ingest | Copy Activity | Fastest way to move data. Zero transformation. |
| Clean | Dataflow Gen2 | Simple transforms (trim, filter, dedup). Visual. Business users can maintain. |
| Enrich | Notebook | Complex logic (SCD MERGE, window functions, custom Python). Full PySpark power. |
| Serve | Semantic Model Refresh | One activity triggers Power BI Direct Lake update. |
| Notify | Teams / Outlook | Built-in. No Logic Apps needed. |
The rule: Use Dataflow Gen2 for what Power Query does well (simple cleaning). Use Notebooks for what PySpark does well (complex transformations). Never force complex logic into Dataflow. Never over-engineer simple cleaning with Spark.
Monitoring Pipelines
Fabric Monitoring Hub
- Click Monitor in the left sidebar
- See ALL pipeline runs across ALL workspaces (unlike ADF which is per factory)
- Filter by: status, date, pipeline name, workspace
Pipeline Run Details
Click a specific run to see: – Each activity’s status (green/red) – Duration per activity – Rows read and written (for Copy activities) – Error messages (for failed activities) – Input and output for each activity
Alerting
Configure alerts in workspace settings or use the Teams/Outlook activities within the pipeline itself for real-time notifications.
Error Handling Patterns
Pattern 1: Activity-Level Retry
Copy Activity Settings:
Retry: 3
Retry interval: 30 seconds
Pattern 2: Red Arrow (On Failure)
Copy_Data ──(green)──► Log_Success
│
└──(red)──► Log_Failure ──► Send_Alert_Email
Pattern 3: Try-Catch with Set Variable
Copy_Data ──(green)──► Set_Variable: status = "SUCCESS"
│
└──(red)──► Set_Variable: status = "FAILED"
│
└──► Set_Variable: error = @activity('Copy_Data').output.errors[0].message
If Condition: @equals(variables('status'), 'FAILED')
True → Send failure alert
False → Continue pipeline
When to Use Pipeline vs Dataflow vs Notebook
| Scenario | Best Tool | Why |
|---|---|---|
| Move data from SQL to Lakehouse | Pipeline (Copy) | Fastest data movement, zero transformation |
| Simple cleaning (trim, filter, dedup) | Dataflow Gen2 | Visual, no code, business users can maintain |
| Complex transforms (SCD, MERGE, windows) | Notebook | Full PySpark/SQL power |
| Orchestrate multiple steps | Pipeline | ForEach, If Condition, sequencing |
| Schedule everything | Pipeline | Built-in scheduler |
| Ad-hoc data exploration | Notebook | Interactive, cell-by-cell |
| Power BI refresh after ETL | Pipeline | Semantic Model Refresh activity |
| Alert on failure | Pipeline | Teams/Outlook activity |
Common Mistakes
-
Trying to do complex transforms in Dataflow Gen2 — SCD MERGE, window functions, and Delta operations belong in Notebooks. Dataflow Gen2 is for simple cleaning.
-
Creating ADF-style datasets in Fabric — datasets do not exist in Fabric. Define source/sink inline in the Copy activity.
-
Not using the Teams/Outlook activities — in ADF, email notifications required Logic Apps. In Fabric, drag-and-drop. Use them.
-
Running Dataflow Gen2 on huge datasets — Dataflow Gen2 is optimized for medium data (millions of rows). For billions, use a Spark notebook.
-
Forgetting to add error handling — every Copy activity should have a red arrow path to a failure handler. Silent failures are production nightmares.
-
Not parameterizing pipelines — hardcoded table names, paths, and dates make pipelines single-use. Parameterize everything.
Interview Questions
Q: What is the difference between Fabric Data Factory and Azure Data Factory? A: Fabric Data Factory is the SaaS version inside Microsoft Fabric. Key differences: no datasets (inline configuration), connections instead of linked services, Dataflow Gen2 instead of Mapping Data Flows, built-in Teams and Outlook notification activities, semantic model refresh activity, and billing included in Fabric capacity. About 90% of ADF activities are available, with the notable exception of SSIS support.
Q: What is Dataflow Gen2 and when should you use it? A: Dataflow Gen2 is the visual, no-code transformation tool built on Power Query. Use it for simple to medium transformations like trimming, filtering, deduplication, and type casting. Do not use it for complex operations like SCD MERGE, window functions, or Delta Lake operations — use Spark notebooks for those.
Q: How do you combine Dataflow Gen2 and Notebooks in a pipeline? A: Use Dataflow Gen2 for Bronze-to-Silver cleaning (simple transforms), then Notebook for Silver-to-Gold enrichment (complex MERGE, aggregations). The pipeline sequences them: Copy → Dataflow Gen2 → Notebook → Semantic Model Refresh → Teams notification. Each tool handles what it does best.
Q: What notification options are available in Fabric Data Factory? A: Fabric has built-in Teams and Outlook activities — drag and drop them into your pipeline. In ADF, notifications required external services like Logic Apps or Azure Functions. This is one of Fabric’s key improvements over ADF.
Q: Why did Fabric remove Datasets? A: Datasets in ADF were an extra layer of configuration that added complexity without much value. In Fabric, source and sink properties (table name, schema, format) are defined inline within the Copy activity itself. This reduces the number of objects to manage and simplifies pipeline design.
Wrapping Up
Fabric Data Factory is ADF evolved — same concepts, less plumbing, more built-in capabilities. The pipeline canvas looks familiar. The expressions are identical. The control flow activities are the same. What changed is the removal of unnecessary complexity (datasets, linked services) and the addition of Fabric-native capabilities (Teams notifications, semantic model refresh, Dataflow Gen2 as a pipeline activity).
The production pattern is clear: Copy for ingestion, Dataflow Gen2 for simple cleaning, Notebooks for complex transformations, Semantic Model Refresh for Power BI, and Teams/Outlook for notifications. One pipeline, five stages, zero external services.
Related posts: – What is Azure Data Factory? – Metadata-Driven Pipeline – Fabric Foundations: Capacity, Workspaces, Items – Microsoft Fabric Overview – ADF Expressions Guide – Medallion Architecture
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.