Dataflow Gen2 in Microsoft Fabric: Introduction, Power Query Basics, Connecting to Sources, and Your First No-Code ETL

You have been writing PySpark code in notebooks to clean data — trim strings, filter rows, rename columns, handle NULLs. It works. But what if a business analyst on your team needs to do the same thing without writing a single line of code?

That is exactly what Dataflow Gen2 solves. It is the no-code/low-code transformation tool inside Microsoft Fabric, built on Power Query — the same engine that powers transformations in Power BI and Excel. You connect to a source, apply transformations by clicking buttons, and write results to a Fabric destination. No PySpark. No SparkSQL. No notebooks. Just a visual interface.

But do not mistake “no-code” for “simple.” Dataflow Gen2 handles joins, pivots, conditional logic, data type conversions, error handling, and complex multi-step transformations. Many production Bronze-to-Silver pipelines run entirely on Dataflow Gen2.

Think of Dataflow Gen2 like a food processor in the kitchen. A Spark notebook is cooking from scratch with individual tools — knife, pan, whisk. You have full control but need to know how to use each tool. A Dataflow Gen2 is a food processor — load the ingredients, press the button for chop/blend/puree, and the machine does the work. Same result (clean data), different tool, different skill level required.

What Is Dataflow Gen2?
Dataflow Gen2 vs ADF Mapping Data Flows
Dataflow Gen2 vs Spark Notebooks
Power Query: The Engine Behind Dataflow Gen2
M Language: What Runs Behind the Clicks
Creating Your First Dataflow Gen2
The Dataflow Gen2 UI Walkthrough
Connecting to Data Sources
All Supported Sources
Connecting to a Lakehouse Table
Connecting to Azure SQL Database
Connecting to CSV/Excel Files
Connecting via OneLake Shortcuts
Basic Transformations (The Essentials)
Choose Columns (Remove Unwanted Columns)
Remove Rows (Filter)
Rename Columns
Change Data Type
Replace Values
Add Column from Examples
Trim, Clean, Uppercase, Lowercase
Split Column
Fill Down / Fill Up
Remove Duplicates
Sort Rows
Writing to Destinations
Lakehouse Table Destination
Warehouse Table Destination
Update Methods: Replace, Append, Upsert
Understanding the Data Destination Mapping
Monitoring Dataflow Gen2 Runs
Common Mistakes
Interview Questions
Wrapping Up

What Is Dataflow Gen2?

Dataflow Gen2 is a visual, no-code data transformation tool in Microsoft Fabric. It uses the Power Query engine (the same one in Power BI Desktop and Excel) to connect to sources, apply transformations, and write results to Fabric destinations.

SOURCE                          DATAFLOW GEN2                        DESTINATION
┌──────────┐                ┌───────────────────────┐              ┌──────────┐
│ SQL DB   │──────►         │  Power Query Editor    │              │ Lakehouse│
│ Lakehouse│──────►         │                        │────────►     │ Table    │
│ CSV/Excel│──────►         │  Transform steps:      │              │          │
│ REST API │──────►         │  1. Remove columns     │              │ Warehouse│
│ SharePoint│─────►         │  2. Filter rows        │────────►     │ Table    │
│ Dataverse│──────►         │  3. Change types       │              │          │
│ 100+ more│──────►         │  4. Rename columns     │              │ Azure SQL│
└──────────┘                │  5. Handle nulls       │              └──────────┘
                            └───────────────────────┘

Key Characteristics

No code required — transformations applied by clicking UI buttons
Power Query engine — same engine used in Power BI and Excel (familiar to millions)
M language underneath — every click generates M code (viewable and editable)
150+ connectors — connects to Azure, AWS, databases, files, APIs, SaaS apps
Multiple destinations — writes to Lakehouse, Warehouse, Azure SQL, KQL Database
Pipeline integration — can be called as an activity inside Fabric Pipelines

Dataflow Gen2 vs ADF Mapping Data Flows

Feature	ADF Mapping Data Flows	Fabric Dataflow Gen2
Engine	Apache Spark	Power Query (Mashup engine)
Language	Spark expressions	M language (Power Query)
Startup time	3-5 minutes (Spark cluster)	Near-instant (no cluster spin-up)
Debug preview	Requires debug cluster	Instant preview (no cluster)
Learning curve	Moderate (Spark concepts)	Low (Excel/Power BI users know it)
Best data size	Medium to large (millions-billions)	Small to medium (thousands-millions)
Destinations	ADLS, SQL, Cosmos DB	Lakehouse, Warehouse, Azure SQL
Cost	Separate billing (Data Flow hours)	Included in Fabric capacity (CUs)
Available in	Azure Data Factory	Microsoft Fabric

Dataflow Gen2 vs Spark Notebooks

Feature	Dataflow Gen2	Spark Notebook
Who uses it	Analysts, citizen data engineers	Data engineers, data scientists
Interface	Visual (point and click)	Code (PySpark, SQL)
Transformations	Filter, rename, merge, pivot, type change	Everything (window functions, UDFs, ML)
Delta MERGE	❌ Not supported	✅ Full support
SCD Type 2	❌ Not supported	✅ Full support
Window functions	Limited	✅ Full support
Custom Python	❌ No	✅ Any Python library
Performance on large data	Moderate	High (distributed Spark)
Ease of use	Very easy (no code)	Requires coding skills
Best for	Simple Bronze→Silver cleaning	Complex transformations, ML

The rule: Use Dataflow Gen2 for simple to medium transformations that an analyst could maintain. Use notebooks for complex logic that requires code.

Power Query: The Engine Behind Dataflow Gen2

Power Query is Microsoft’s data transformation engine, used in: – Excel — Get & Transform Data (Power Query tab) – Power BI Desktop — Transform Data button – Dataflows in Power BI Service — cloud Power Query – Dataflow Gen2 in Fabric — the latest, most powerful version

If you have ever used “Get Data → Transform” in Power BI or Excel, you already know Power Query.

M Language: What Runs Behind the Clicks

Every transformation you apply in the UI generates M language code behind the scenes:

// When you click "Remove Columns" → this M code is generated:
let
    Source = Lakehouse.Tables("bronze_lakehouse"),
    customers = Source{[Name="customers"]}[Data],
    #"Removed Columns" = Table.RemoveColumns(customers, {"middle_name", "suffix"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each [salary] > 50000),
    #"Renamed Columns" = Table.RenameColumns(#"Filtered Rows", {{"first_name", "FirstName"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"salary", type number}})
in
    #"Changed Type"

You never NEED to write M code — the UI generates it. But understanding that it exists helps with debugging and advanced scenarios.

Creating Your First Dataflow Gen2

Step by Step

Open your Fabric workspace
Click + New item → Dataflow Gen2
Name it: DF_Clean_Customers
The Power Query Editor opens

The Dataflow Gen2 UI Walkthrough

┌─────────────────────────────────────────────────────────────────┐
│  Dataflow Gen2: DF_Clean_Customers                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────┐   ┌──────────────────────────────────────────┐     │
│  │ Queries │   │  DATA PREVIEW                             │     │
│  │ panel   │   │  (shows current state of the data)        │     │
│  │         │   │                                           │     │
│  │ customers│  │  Name    │ Email    │ Salary │ City       │     │
│  │ products │  │  Naveen  │ nav@... │ 105000 │ Toronto    │     │
│  │ orders   │  │  Shrey   │ shr@... │ 95000  │ Toronto    │     │
│  │         │   │  Vrushab │ vru@... │ 92000  │ Vancouver  │     │
│  └─────────┘   └──────────────────────────────────────────┘     │
│                                                                  │
│  APPLIED STEPS (right panel):                                    │
│  1. Source                                                       │
│  2. Navigation                                                   │
│  3. Removed Columns                   ← Click to see state here  │
│  4. Filtered Rows                                                │
│  5. Renamed Columns                                              │
│  6. Changed Type                                                 │
│                                                                  │
│  RIBBON: Home │ Transform │ Add Column │ View                    │
│  [Choose Columns] [Remove Rows] [Replace Values] [Change Type]  │
└─────────────────────────────────────────────────────────────────┘

Three panels: – Queries (left): List of data sources / tables you are working with – Data Preview (center): Shows the data at the current transformation step – Applied Steps (right): Shows every transformation in order — click any step to see the data at that point

Real-life analogy: Applied Steps are like undo history in Photoshop. Each step is a transformation applied to the data. Click any step to see what the data looked like BEFORE or AFTER that step. You can delete, reorder, or insert steps.

Connecting to Data Sources

All Supported Sources (Key Categories)

Category	Sources
Fabric	Lakehouse, Warehouse, KQL Database, Semantic Model
Azure	Azure SQL, Azure Blob, ADLS Gen2, Cosmos DB, Synapse
Databases	SQL Server, PostgreSQL, MySQL, Oracle, SAP HANA
Files	CSV, Excel, JSON, XML, Parquet, PDF
Online Services	SharePoint, Dataverse, Salesforce, Google Analytics
Other	REST API (Web), OData, ODBC, Blank Query

Connecting to a Lakehouse Table

In the Power Query Editor, click Get Data → More…
Select Microsoft Fabric → Lakehouse
Select your workspace and lakehouse
Choose the table (e.g., customers)
Click Create → data preview loads

Connecting to Azure SQL Database

Get Data → Azure → Azure SQL Database
Enter server: server-name.database.windows.net
Enter database: AdventureWorksLT
Authentication: Organizational account or SQL auth
Select tables → Create

Connecting to CSV/Excel Files

Get Data → Text/CSV or Excel
Browse to file location (OneLake, ADLS, local upload)
Preview the data → adjust delimiter, header row, encoding
Click Create

All Supported Sources

Dataflow Gen2 connects to 150+ data sources through Power Query connectors. The most commonly used sources for data engineering:

Category	Sources
Azure	Azure SQL Database, Azure Synapse, ADLS Gen2, Cosmos DB, Azure Blob
Databases	SQL Server, Oracle, MySQL, PostgreSQL, Snowflake, Databricks
Files	CSV, Excel, JSON, Parquet, XML (from OneLake, ADLS, S3, SharePoint)
SaaS	Salesforce, Dynamics 365, SharePoint Lists, Dataverse, SAP
OneLake	Lakehouse tables, Warehouse tables, KQL databases
Web	REST APIs (JSON/XML), Web pages, OData feeds

Connecting via OneLake Shortcuts

If your data is accessible via a OneLake shortcut (ADLS Gen2, S3, or another Lakehouse), you can use the Lakehouse connector in Dataflow Gen2 to read shortcut tables directly — no separate connection needed. The shortcut makes external data appear as a local Lakehouse table, and Dataflow Gen2 reads it as if it were native data.

Basic Transformations (The Essentials)

Choose Columns (Remove Unwanted Columns)

Use case: Source table has 50 columns, you only need 10.

Steps:
1. Home tab → Choose Columns
2. Uncheck columns you do not need
3. Click OK

Or: Right-click a column header → Remove Column
Or: Select multiple columns → Right-click → Remove Other Columns

Remove Rows (Filter)

Use case: Remove test data, filter to specific dates, exclude NULLs.

Steps:
1. Click the dropdown arrow (▼) on any column header
2. Uncheck values you want to exclude
   OR select "Does Not Equal", "Greater Than", "Contains", etc.
3. Click OK

Common filters:
  - Remove nulls: ▼ → uncheck (null)
  - Remove blanks: ▼ → uncheck (blank)
  - Date range: ▼ → Date Filters → After → 2025-01-01
  - Text filter: ▼ → Text Filters → Contains → "Engineering"
  - Number filter: ▼ → Number Filters → Greater Than → 50000

Rename Columns

Steps:
1. Double-click the column header
2. Type the new name
3. Press Enter

Or: Right-click → Rename

Change Data Type

Use case: salary loaded as text, needs to be number. date loaded as string.

Steps:
1. Click the data type icon (left of column header)
   ABC = Text, 123 = Number, 📅 = Date
2. Select the correct type from the dropdown

Or: Select column → Transform tab → Data Type → select type

Replace Values

Use case: Replace "N/A" with null, fix typos, standardize values.

Steps:
1. Select the column
2. Transform tab → Replace Values
3. Value to find: "N/A"
4. Replace with: (leave blank for null)
5. Click OK

Examples:
  "Male" → "M", "Female" → "F"
  "United States" → "USA"
  "  " (spaces) → "" (empty) → then remove blanks

Add Column from Examples

This is one of Power Query’s most powerful features — you type EXAMPLES of what you want, and it figures out the pattern:

Use case: Extract first name from full name "Naveen Vuppula"

Steps:
1. Add Column tab → Column from Examples → From Selection
2. Select the "full_name" column
3. In the new column, type: "Naveen" (next to "Naveen Vuppula")
4. Power Query auto-fills: "Shrey" (next to "Shrey Patil"), etc.
5. If correct, click OK

It figured out: "Take everything before the first space"
No regex needed. No code. Just an example.

Trim, Clean, Uppercase, Lowercase

Steps:
1. Select the column
2. Transform tab → Format → choose:
   - Trim: removes leading/trailing spaces
   - Clean: removes non-printable characters
   - UPPERCASE: "naveen" → "NAVEEN"
   - lowercase: "NAVEEN" → "naveen"
   - Capitalize Each Word: "naveen vuppula" → "Naveen Vuppula"

Split Column

Use case: Split "Toronto, ON, Canada" into City, Province, Country.

Steps:
1. Select the column
2. Transform tab → Split Column → By Delimiter
3. Delimiter: Comma (,)
4. Split at: Each occurrence
5. Click OK
6. Rename the resulting columns: City, Province, Country

Fill Down / Fill Up

Use case: Source data has merged cells where the category only appears once:

| Category    | Product    |
|-------------|-----------|
| Electronics | Laptop    |
| null        | Monitor   |
| null        | Keyboard  |
| Clothing    | T-Shirt   |
| null        | Jeans     |

Steps:
1. Select the Category column
2. Transform tab → Fill → Down

Result:
| Category    | Product    |
|-------------|-----------|
| Electronics | Laptop    |
| Electronics | Monitor   |
| Electronics | Keyboard  |
| Clothing    | T-Shirt   |
| Clothing    | Jeans     |

Remove Duplicates

Steps:
1. Select the column(s) that define uniqueness
2. Home tab → Remove Rows → Remove Duplicates

For composite uniqueness (customer_id + product):
1. Select both columns (Ctrl+click)
2. Remove Duplicates — keeps the first occurrence

Sort Rows

Steps:
1. Click the column header dropdown (▼)
2. Select "Sort Ascending" or "Sort Descending"

Or: Right-click column header → Sort Ascending/Descending

Writing to Destinations

After transforming data, you must configure WHERE the results go.

Lakehouse Table Destination

In the bottom-right corner, click Add data destination (or the + icon)
Select Lakehouse
Choose your workspace → lakehouse → table name
Configure the update method (see below)
Map source columns to destination columns

Warehouse Table Destination

Add data destination → Warehouse
Choose workspace → warehouse → schema → table
Map columns

Update Methods: Replace, Append, Upsert

Method	What It Does	When to Use
Replace	Drops existing data, loads new data (full refresh)	Full loads, reference tables, small tables
Append	Adds new rows to existing data	Incremental loads, log tables, event data

Note: Dataflow Gen2 does NOT support MERGE/Upsert natively as a built-in destination update method. For upsert (SCD) patterns, use a Spark notebook or Warehouse T-SQL MERGE.

Understanding the Data Destination Mapping

Column Mapping Screen:
  Source Column      →    Destination Column
  ─────────────           ──────────────────
  FirstName          →    first_name
  LastName           →    last_name
  EmailAddress       →    email
  AnnualSalary       →    salary
  (auto-mapped by name match, manually adjustable)

If the destination table does not exist, Dataflow Gen2 can CREATE it automatically based on the source schema.

Monitoring Dataflow Gen2 Runs

From the Workspace

Open your workspace
Click the refresh icon next to the Dataflow Gen2 name
Or: right-click → Refresh history → see past runs

From Monitoring Hub

Click Monitor in the left sidebar
Filter by Item type: Dataflow Gen2
See status, duration, start/end time for each run

Inside a Pipeline

When Dataflow Gen2 runs as a pipeline activity, its status shows in the pipeline run details — green (success), red (failed), with duration and error messages.

Common Mistakes

Not setting a data destination — if you skip the destination configuration, the dataflow transforms data but writes it NOWHERE. Always configure a destination before publishing.
Using Replace when Append is needed — Replace deletes all existing data. If you are loading daily incremental data, use Append. Replace is for full refresh only.
Not handling data type mismatches — source column is text, destination expects number. Change the type BEFORE mapping to the destination.
Too many transformations on large data — Dataflow Gen2 is optimized for medium data. If you have 100 million rows and 50 transformation steps, use a Spark notebook instead.
Forgetting to Publish — after building the dataflow, you must click Publish (bottom-right). Without publishing, the dataflow is in draft mode and cannot be run or called from pipelines.
Not previewing before destination — always scroll through the data preview to verify transformations before configuring the destination. Catching errors early saves pipeline failures later.

Interview Questions

Q: What is Dataflow Gen2 in Microsoft Fabric? A: A no-code visual data transformation tool built on the Power Query engine. It connects to 150+ sources, applies transformations via a point-and-click UI, and writes results to Fabric destinations (Lakehouse, Warehouse). It generates M language code behind the scenes. Ideal for simple to medium transformations that analysts can build and maintain.

Q: How does Dataflow Gen2 compare to Spark Notebooks? A: Dataflow Gen2 is no-code (Power Query), good for simple cleaning (filter, rename, type change, dedup). Notebooks are code-based (PySpark), required for complex logic (window functions, Delta MERGE, SCD Type 2, ML). Use Dataflow Gen2 for Bronze-to-Silver cleaning. Use notebooks for Silver-to-Gold enrichment.

Q: What destinations does Dataflow Gen2 support? A: Lakehouse tables, Warehouse tables, Azure SQL Database, and KQL Database. The two update methods are Replace (full refresh) and Append (incremental). Dataflow Gen2 does not support MERGE/upsert natively — use Warehouse T-SQL MERGE or Spark notebook for that.

Wrapping Up

Dataflow Gen2 is the accessible entry point to data transformation in Fabric. No PySpark knowledge needed. No cluster startup delays. Just connect to a source, click transformations, and write to a destination.

In the next post, we cover the advanced transformations: Merge queries (joins), Append queries (unions), Pivot/Unpivot, Group By, conditional columns, custom columns, and error handling.

← Previous: Triggers, Scheduling & Orchestration

Fabric (13/34)

Next: Dataflow Gen2: Advanced Transforms →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

← Triggers, Scheduling & Orchestration Dataflow Gen2: Advanced Transforms →