Dataflow Gen2 in Microsoft Fabric: Introduction, Power Query Basics, Connecting to Sources, and Your First No-Code ETL
You have been writing PySpark code in notebooks to clean data — trim strings, filter rows, rename columns, handle NULLs. It works. But what if a business analyst on your team needs to do the same thing without writing a single line of code?
That is exactly what Dataflow Gen2 solves. It is the no-code/low-code transformation tool inside Microsoft Fabric, built on Power Query — the same engine that powers transformations in Power BI and Excel. You connect to a source, apply transformations by clicking buttons, and write results to a Fabric destination. No PySpark. No SparkSQL. No notebooks. Just a visual interface.
But do not mistake “no-code” for “simple.” Dataflow Gen2 handles joins, pivots, conditional logic, data type conversions, error handling, and complex multi-step transformations. Many production Bronze-to-Silver pipelines run entirely on Dataflow Gen2.
Think of Dataflow Gen2 like a food processor in the kitchen. A Spark notebook is cooking from scratch with individual tools — knife, pan, whisk. You have full control but need to know how to use each tool. A Dataflow Gen2 is a food processor — load the ingredients, press the button for chop/blend/puree, and the machine does the work. Same result (clean data), different tool, different skill level required.
Table of Contents
- What Is Dataflow Gen2?
- Dataflow Gen2 vs ADF Mapping Data Flows
- Dataflow Gen2 vs Spark Notebooks
- Power Query: The Engine Behind Dataflow Gen2
- M Language: What Runs Behind the Clicks
- Creating Your First Dataflow Gen2
- The Dataflow Gen2 UI Walkthrough
- Connecting to Data Sources
- All Supported Sources
- Connecting to a Lakehouse Table
- Connecting to Azure SQL Database
- Connecting to CSV/Excel Files
- Connecting via OneLake Shortcuts
- Basic Transformations (The Essentials)
- Choose Columns (Remove Unwanted Columns)
- Remove Rows (Filter)
- Rename Columns
- Change Data Type
- Replace Values
- Add Column from Examples
- Trim, Clean, Uppercase, Lowercase
- Split Column
- Fill Down / Fill Up
- Remove Duplicates
- Sort Rows
- Writing to Destinations
- Lakehouse Table Destination
- Warehouse Table Destination
- Update Methods: Replace, Append, Upsert
- Understanding the Data Destination Mapping
- Monitoring Dataflow Gen2 Runs
- Common Mistakes
- Interview Questions
- Wrapping Up
What Is Dataflow Gen2?
Dataflow Gen2 is a visual, no-code data transformation tool in Microsoft Fabric. It uses the Power Query engine (the same one in Power BI Desktop and Excel) to connect to sources, apply transformations, and write results to Fabric destinations.
SOURCE DATAFLOW GEN2 DESTINATION
┌──────────┐ ┌───────────────────────┐ ┌──────────┐
│ SQL DB │──────► │ Power Query Editor │ │ Lakehouse│
│ Lakehouse│──────► │ │────────► │ Table │
│ CSV/Excel│──────► │ Transform steps: │ │ │
│ REST API │──────► │ 1. Remove columns │ │ Warehouse│
│ SharePoint│─────► │ 2. Filter rows │────────► │ Table │
│ Dataverse│──────► │ 3. Change types │ │ │
│ 100+ more│──────► │ 4. Rename columns │ │ Azure SQL│
└──────────┘ │ 5. Handle nulls │ └──────────┘
└───────────────────────┘
Key Characteristics
- No code required — transformations applied by clicking UI buttons
- Power Query engine — same engine used in Power BI and Excel (familiar to millions)
- M language underneath — every click generates M code (viewable and editable)
- 150+ connectors — connects to Azure, AWS, databases, files, APIs, SaaS apps
- Multiple destinations — writes to Lakehouse, Warehouse, Azure SQL, KQL Database
- Pipeline integration — can be called as an activity inside Fabric Pipelines
Dataflow Gen2 vs ADF Mapping Data Flows
| Feature | ADF Mapping Data Flows | Fabric Dataflow Gen2 |
|---|---|---|
| Engine | Apache Spark | Power Query (Mashup engine) |
| Language | Spark expressions | M language (Power Query) |
| Startup time | 3-5 minutes (Spark cluster) | Near-instant (no cluster spin-up) |
| Debug preview | Requires debug cluster | Instant preview (no cluster) |
| Learning curve | Moderate (Spark concepts) | Low (Excel/Power BI users know it) |
| Best data size | Medium to large (millions-billions) | Small to medium (thousands-millions) |
| Destinations | ADLS, SQL, Cosmos DB | Lakehouse, Warehouse, Azure SQL |
| Cost | Separate billing (Data Flow hours) | Included in Fabric capacity (CUs) |
| Available in | Azure Data Factory | Microsoft Fabric |
Dataflow Gen2 vs Spark Notebooks
| Feature | Dataflow Gen2 | Spark Notebook |
|---|---|---|
| Who uses it | Analysts, citizen data engineers | Data engineers, data scientists |
| Interface | Visual (point and click) | Code (PySpark, SQL) |
| Transformations | Filter, rename, merge, pivot, type change | Everything (window functions, UDFs, ML) |
| Delta MERGE | ❌ Not supported | ✅ Full support |
| SCD Type 2 | ❌ Not supported | ✅ Full support |
| Window functions | Limited | ✅ Full support |
| Custom Python | ❌ No | ✅ Any Python library |
| Performance on large data | Moderate | High (distributed Spark) |
| Ease of use | Very easy (no code) | Requires coding skills |
| Best for | Simple Bronze→Silver cleaning | Complex transformations, ML |
The rule: Use Dataflow Gen2 for simple to medium transformations that an analyst could maintain. Use notebooks for complex logic that requires code.
Power Query: The Engine Behind Dataflow Gen2
Power Query is Microsoft’s data transformation engine, used in: – Excel — Get & Transform Data (Power Query tab) – Power BI Desktop — Transform Data button – Dataflows in Power BI Service — cloud Power Query – Dataflow Gen2 in Fabric — the latest, most powerful version
If you have ever used “Get Data → Transform” in Power BI or Excel, you already know Power Query.
M Language: What Runs Behind the Clicks
Every transformation you apply in the UI generates M language code behind the scenes:
// When you click "Remove Columns" → this M code is generated:
let
Source = Lakehouse.Tables("bronze_lakehouse"),
customers = Source{[Name="customers"]}[Data],
#"Removed Columns" = Table.RemoveColumns(customers, {"middle_name", "suffix"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each [salary] > 50000),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows", {{"first_name", "FirstName"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"salary", type number}})
in
#"Changed Type"
You never NEED to write M code — the UI generates it. But understanding that it exists helps with debugging and advanced scenarios.
Creating Your First Dataflow Gen2
Step by Step
- Open your Fabric workspace
- Click + New item → Dataflow Gen2
- Name it:
DF_Clean_Customers - The Power Query Editor opens
The Dataflow Gen2 UI Walkthrough
┌─────────────────────────────────────────────────────────────────┐
│ Dataflow Gen2: DF_Clean_Customers │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌──────────────────────────────────────────┐ │
│ │ Queries │ │ DATA PREVIEW │ │
│ │ panel │ │ (shows current state of the data) │ │
│ │ │ │ │ │
│ │ customers│ │ Name │ Email │ Salary │ City │ │
│ │ products │ │ Naveen │ nav@... │ 105000 │ Toronto │ │
│ │ orders │ │ Shrey │ shr@... │ 95000 │ Toronto │ │
│ │ │ │ Vrushab │ vru@... │ 92000 │ Vancouver │ │
│ └─────────┘ └──────────────────────────────────────────┘ │
│ │
│ APPLIED STEPS (right panel): │
│ 1. Source │
│ 2. Navigation │
│ 3. Removed Columns ← Click to see state here │
│ 4. Filtered Rows │
│ 5. Renamed Columns │
│ 6. Changed Type │
│ │
│ RIBBON: Home │ Transform │ Add Column │ View │
│ [Choose Columns] [Remove Rows] [Replace Values] [Change Type] │
└─────────────────────────────────────────────────────────────────┘
Three panels: – Queries (left): List of data sources / tables you are working with – Data Preview (center): Shows the data at the current transformation step – Applied Steps (right): Shows every transformation in order — click any step to see the data at that point
Real-life analogy: Applied Steps are like undo history in Photoshop. Each step is a transformation applied to the data. Click any step to see what the data looked like BEFORE or AFTER that step. You can delete, reorder, or insert steps.
Connecting to Data Sources
All Supported Sources (Key Categories)
| Category | Sources |
|---|---|
| Fabric | Lakehouse, Warehouse, KQL Database, Semantic Model |
| Azure | Azure SQL, Azure Blob, ADLS Gen2, Cosmos DB, Synapse |
| Databases | SQL Server, PostgreSQL, MySQL, Oracle, SAP HANA |
| Files | CSV, Excel, JSON, XML, Parquet, PDF |
| Online Services | SharePoint, Dataverse, Salesforce, Google Analytics |
| Other | REST API (Web), OData, ODBC, Blank Query |
Connecting to a Lakehouse Table
- In the Power Query Editor, click Get Data → More…
- Select Microsoft Fabric → Lakehouse
- Select your workspace and lakehouse
- Choose the table (e.g.,
customers) - Click Create → data preview loads
Connecting to Azure SQL Database
- Get Data → Azure → Azure SQL Database
- Enter server:
server-name.database.windows.net - Enter database:
AdventureWorksLT - Authentication: Organizational account or SQL auth
- Select tables → Create
Connecting to CSV/Excel Files
- Get Data → Text/CSV or Excel
- Browse to file location (OneLake, ADLS, local upload)
- Preview the data → adjust delimiter, header row, encoding
- Click Create
Basic Transformations (The Essentials)
Choose Columns (Remove Unwanted Columns)
Use case: Source table has 50 columns, you only need 10.
Steps:
1. Home tab → Choose Columns
2. Uncheck columns you do not need
3. Click OK
Or: Right-click a column header → Remove Column
Or: Select multiple columns → Right-click → Remove Other Columns
Remove Rows (Filter)
Use case: Remove test data, filter to specific dates, exclude NULLs.
Steps:
1. Click the dropdown arrow (▼) on any column header
2. Uncheck values you want to exclude
OR select "Does Not Equal", "Greater Than", "Contains", etc.
3. Click OK
Common filters:
- Remove nulls: ▼ → uncheck (null)
- Remove blanks: ▼ → uncheck (blank)
- Date range: ▼ → Date Filters → After → 2025-01-01
- Text filter: ▼ → Text Filters → Contains → "Engineering"
- Number filter: ▼ → Number Filters → Greater Than → 50000
Rename Columns
Steps:
1. Double-click the column header
2. Type the new name
3. Press Enter
Or: Right-click → Rename
Change Data Type
Use case: salary loaded as text, needs to be number. date loaded as string.
Steps:
1. Click the data type icon (left of column header)
ABC = Text, 123 = Number, 📅 = Date
2. Select the correct type from the dropdown
Or: Select column → Transform tab → Data Type → select type
Replace Values
Use case: Replace "N/A" with null, fix typos, standardize values.
Steps:
1. Select the column
2. Transform tab → Replace Values
3. Value to find: "N/A"
4. Replace with: (leave blank for null)
5. Click OK
Examples:
"Male" → "M", "Female" → "F"
"United States" → "USA"
" " (spaces) → "" (empty) → then remove blanks
Add Column from Examples
This is one of Power Query’s most powerful features — you type EXAMPLES of what you want, and it figures out the pattern:
Use case: Extract first name from full name "Naveen Vuppula"
Steps:
1. Add Column tab → Column from Examples → From Selection
2. Select the "full_name" column
3. In the new column, type: "Naveen" (next to "Naveen Vuppula")
4. Power Query auto-fills: "Shrey" (next to "Shrey Patil"), etc.
5. If correct, click OK
It figured out: "Take everything before the first space"
No regex needed. No code. Just an example.
Trim, Clean, Uppercase, Lowercase
Steps:
1. Select the column
2. Transform tab → Format → choose:
- Trim: removes leading/trailing spaces
- Clean: removes non-printable characters
- UPPERCASE: "naveen" → "NAVEEN"
- lowercase: "NAVEEN" → "naveen"
- Capitalize Each Word: "naveen vuppula" → "Naveen Vuppula"
Split Column
Use case: Split "Toronto, ON, Canada" into City, Province, Country.
Steps:
1. Select the column
2. Transform tab → Split Column → By Delimiter
3. Delimiter: Comma (,)
4. Split at: Each occurrence
5. Click OK
6. Rename the resulting columns: City, Province, Country
Fill Down / Fill Up
Use case: Source data has merged cells where the category only appears once:
| Category | Product |
|-------------|-----------|
| Electronics | Laptop |
| null | Monitor |
| null | Keyboard |
| Clothing | T-Shirt |
| null | Jeans |
Steps:
1. Select the Category column
2. Transform tab → Fill → Down
Result:
| Category | Product |
|-------------|-----------|
| Electronics | Laptop |
| Electronics | Monitor |
| Electronics | Keyboard |
| Clothing | T-Shirt |
| Clothing | Jeans |
Remove Duplicates
Steps:
1. Select the column(s) that define uniqueness
2. Home tab → Remove Rows → Remove Duplicates
For composite uniqueness (customer_id + product):
1. Select both columns (Ctrl+click)
2. Remove Duplicates — keeps the first occurrence
Sort Rows
Steps:
1. Click the column header dropdown (▼)
2. Select "Sort Ascending" or "Sort Descending"
Or: Right-click column header → Sort Ascending/Descending
Writing to Destinations
After transforming data, you must configure WHERE the results go.
Lakehouse Table Destination
- In the bottom-right corner, click Add data destination (or the + icon)
- Select Lakehouse
- Choose your workspace → lakehouse → table name
- Configure the update method (see below)
- Map source columns to destination columns
Warehouse Table Destination
- Add data destination → Warehouse
- Choose workspace → warehouse → schema → table
- Map columns
Update Methods: Replace, Append, Upsert
| Method | What It Does | When to Use |
|---|---|---|
| Replace | Drops existing data, loads new data (full refresh) | Full loads, reference tables, small tables |
| Append | Adds new rows to existing data | Incremental loads, log tables, event data |
Note: Dataflow Gen2 does NOT support MERGE/Upsert natively as a built-in destination update method. For upsert (SCD) patterns, use a Spark notebook or Warehouse T-SQL MERGE.
Understanding the Data Destination Mapping
Column Mapping Screen:
Source Column → Destination Column
───────────── ──────────────────
FirstName → first_name
LastName → last_name
EmailAddress → email
AnnualSalary → salary
(auto-mapped by name match, manually adjustable)
If the destination table does not exist, Dataflow Gen2 can CREATE it automatically based on the source schema.
Monitoring Dataflow Gen2 Runs
From the Workspace
- Open your workspace
- Click the refresh icon next to the Dataflow Gen2 name
- Or: right-click → Refresh history → see past runs
From Monitoring Hub
- Click Monitor in the left sidebar
- Filter by Item type: Dataflow Gen2
- See status, duration, start/end time for each run
Inside a Pipeline
When Dataflow Gen2 runs as a pipeline activity, its status shows in the pipeline run details — green (success), red (failed), with duration and error messages.
Common Mistakes
-
Not setting a data destination — if you skip the destination configuration, the dataflow transforms data but writes it NOWHERE. Always configure a destination before publishing.
-
Using Replace when Append is needed — Replace deletes all existing data. If you are loading daily incremental data, use Append. Replace is for full refresh only.
-
Not handling data type mismatches — source column is text, destination expects number. Change the type BEFORE mapping to the destination.
-
Too many transformations on large data — Dataflow Gen2 is optimized for medium data. If you have 100 million rows and 50 transformation steps, use a Spark notebook instead.
-
Forgetting to Publish — after building the dataflow, you must click Publish (bottom-right). Without publishing, the dataflow is in draft mode and cannot be run or called from pipelines.
-
Not previewing before destination — always scroll through the data preview to verify transformations before configuring the destination. Catching errors early saves pipeline failures later.
Interview Questions
Q: What is Dataflow Gen2 in Microsoft Fabric? A: A no-code visual data transformation tool built on the Power Query engine. It connects to 150+ sources, applies transformations via a point-and-click UI, and writes results to Fabric destinations (Lakehouse, Warehouse). It generates M language code behind the scenes. Ideal for simple to medium transformations that analysts can build and maintain.
Q: How does Dataflow Gen2 compare to Spark Notebooks? A: Dataflow Gen2 is no-code (Power Query), good for simple cleaning (filter, rename, type change, dedup). Notebooks are code-based (PySpark), required for complex logic (window functions, Delta MERGE, SCD Type 2, ML). Use Dataflow Gen2 for Bronze-to-Silver cleaning. Use notebooks for Silver-to-Gold enrichment.
Q: What destinations does Dataflow Gen2 support? A: Lakehouse tables, Warehouse tables, Azure SQL Database, and KQL Database. The two update methods are Replace (full refresh) and Append (incremental). Dataflow Gen2 does not support MERGE/upsert natively — use Warehouse T-SQL MERGE or Spark notebook for that.
Wrapping Up
Dataflow Gen2 is the accessible entry point to data transformation in Fabric. No PySpark knowledge needed. No cluster startup delays. Just connect to a source, click transformations, and write to a destination.
In the next post, we cover the advanced transformations: Merge queries (joins), Append queries (unions), Pivot/Unpivot, Group By, conditional columns, custom columns, and error handling.
Related posts: – Fabric Data Factory: Pipelines and Activities – Fabric Foundations: Capacity, Workspaces, Items – Lakehouse vs Warehouse – PySpark Transformations Cookbook
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.