Dataflow Gen2 in Microsoft Fabric: Introduction, Power Query Basics, Connecting to Sources, and Your First No-Code ETL

Dataflow Gen2 in Microsoft Fabric: Introduction, Power Query Basics, Connecting to Sources, and Your First No-Code ETL

You have been writing PySpark code in notebooks to clean data — trim strings, filter rows, rename columns, handle NULLs. It works. But what if a business analyst on your team needs to do the same thing without writing a single line of code?

That is exactly what Dataflow Gen2 solves. It is the no-code/low-code transformation tool inside Microsoft Fabric, built on Power Query — the same engine that powers transformations in Power BI and Excel. You connect to a source, apply transformations by clicking buttons, and write results to a Fabric destination. No PySpark. No SparkSQL. No notebooks. Just a visual interface.

But do not mistake “no-code” for “simple.” Dataflow Gen2 handles joins, pivots, conditional logic, data type conversions, error handling, and complex multi-step transformations. Many production Bronze-to-Silver pipelines run entirely on Dataflow Gen2.

Think of Dataflow Gen2 like a food processor in the kitchen. A Spark notebook is cooking from scratch with individual tools — knife, pan, whisk. You have full control but need to know how to use each tool. A Dataflow Gen2 is a food processor — load the ingredients, press the button for chop/blend/puree, and the machine does the work. Same result (clean data), different tool, different skill level required.

Table of Contents

  • What Is Dataflow Gen2?
  • Dataflow Gen2 vs ADF Mapping Data Flows
  • Dataflow Gen2 vs Spark Notebooks
  • Power Query: The Engine Behind Dataflow Gen2
  • M Language: What Runs Behind the Clicks
  • Creating Your First Dataflow Gen2
  • The Dataflow Gen2 UI Walkthrough
  • Connecting to Data Sources
  • All Supported Sources
  • Connecting to a Lakehouse Table
  • Connecting to Azure SQL Database
  • Connecting to CSV/Excel Files
  • Connecting via OneLake Shortcuts
  • Basic Transformations (The Essentials)
  • Choose Columns (Remove Unwanted Columns)
  • Remove Rows (Filter)
  • Rename Columns
  • Change Data Type
  • Replace Values
  • Add Column from Examples
  • Trim, Clean, Uppercase, Lowercase
  • Split Column
  • Fill Down / Fill Up
  • Remove Duplicates
  • Sort Rows
  • Writing to Destinations
  • Lakehouse Table Destination
  • Warehouse Table Destination
  • Update Methods: Replace, Append, Upsert
  • Understanding the Data Destination Mapping
  • Monitoring Dataflow Gen2 Runs
  • Common Mistakes
  • Interview Questions
  • Wrapping Up

What Is Dataflow Gen2?

Dataflow Gen2 is a visual, no-code data transformation tool in Microsoft Fabric. It uses the Power Query engine (the same one in Power BI Desktop and Excel) to connect to sources, apply transformations, and write results to Fabric destinations.

SOURCE                          DATAFLOW GEN2                        DESTINATION
┌──────────┐                ┌───────────────────────┐              ┌──────────┐
│ SQL DB   │──────►         │  Power Query Editor    │              │ Lakehouse│
│ Lakehouse│──────►         │                        │────────►     │ Table    │
│ CSV/Excel│──────►         │  Transform steps:      │              │          │
│ REST API │──────►         │  1. Remove columns     │              │ Warehouse│
│ SharePoint│─────►         │  2. Filter rows        │────────►     │ Table    │
│ Dataverse│──────►         │  3. Change types       │              │          │
│ 100+ more│──────►         │  4. Rename columns     │              │ Azure SQL│
└──────────┘                │  5. Handle nulls       │              └──────────┘
                            └───────────────────────┘

Key Characteristics

  • No code required — transformations applied by clicking UI buttons
  • Power Query engine — same engine used in Power BI and Excel (familiar to millions)
  • M language underneath — every click generates M code (viewable and editable)
  • 150+ connectors — connects to Azure, AWS, databases, files, APIs, SaaS apps
  • Multiple destinations — writes to Lakehouse, Warehouse, Azure SQL, KQL Database
  • Pipeline integration — can be called as an activity inside Fabric Pipelines

Dataflow Gen2 vs ADF Mapping Data Flows

Feature ADF Mapping Data Flows Fabric Dataflow Gen2
Engine Apache Spark Power Query (Mashup engine)
Language Spark expressions M language (Power Query)
Startup time 3-5 minutes (Spark cluster) Near-instant (no cluster spin-up)
Debug preview Requires debug cluster Instant preview (no cluster)
Learning curve Moderate (Spark concepts) Low (Excel/Power BI users know it)
Best data size Medium to large (millions-billions) Small to medium (thousands-millions)
Destinations ADLS, SQL, Cosmos DB Lakehouse, Warehouse, Azure SQL
Cost Separate billing (Data Flow hours) Included in Fabric capacity (CUs)
Available in Azure Data Factory Microsoft Fabric

Dataflow Gen2 vs Spark Notebooks

Feature Dataflow Gen2 Spark Notebook
Who uses it Analysts, citizen data engineers Data engineers, data scientists
Interface Visual (point and click) Code (PySpark, SQL)
Transformations Filter, rename, merge, pivot, type change Everything (window functions, UDFs, ML)
Delta MERGE ❌ Not supported ✅ Full support
SCD Type 2 ❌ Not supported ✅ Full support
Window functions Limited ✅ Full support
Custom Python ❌ No ✅ Any Python library
Performance on large data Moderate High (distributed Spark)
Ease of use Very easy (no code) Requires coding skills
Best for Simple Bronze→Silver cleaning Complex transformations, ML

The rule: Use Dataflow Gen2 for simple to medium transformations that an analyst could maintain. Use notebooks for complex logic that requires code.

Power Query: The Engine Behind Dataflow Gen2

Power Query is Microsoft’s data transformation engine, used in: – Excel — Get & Transform Data (Power Query tab) – Power BI Desktop — Transform Data button – Dataflows in Power BI Service — cloud Power Query – Dataflow Gen2 in Fabric — the latest, most powerful version

If you have ever used “Get Data → Transform” in Power BI or Excel, you already know Power Query.

M Language: What Runs Behind the Clicks

Every transformation you apply in the UI generates M language code behind the scenes:

// When you click "Remove Columns" → this M code is generated:
let
    Source = Lakehouse.Tables("bronze_lakehouse"),
    customers = Source{[Name="customers"]}[Data],
    #"Removed Columns" = Table.RemoveColumns(customers, {"middle_name", "suffix"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each [salary] > 50000),
    #"Renamed Columns" = Table.RenameColumns(#"Filtered Rows", {{"first_name", "FirstName"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"salary", type number}})
in
    #"Changed Type"

You never NEED to write M code — the UI generates it. But understanding that it exists helps with debugging and advanced scenarios.

Creating Your First Dataflow Gen2

Step by Step

  1. Open your Fabric workspace
  2. Click + New itemDataflow Gen2
  3. Name it: DF_Clean_Customers
  4. The Power Query Editor opens

The Dataflow Gen2 UI Walkthrough

┌─────────────────────────────────────────────────────────────────┐
│  Dataflow Gen2: DF_Clean_Customers                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────┐   ┌──────────────────────────────────────────┐     │
│  │ Queries │   │  DATA PREVIEW                             │     │
│  │ panel   │   │  (shows current state of the data)        │     │
│  │         │   │                                           │     │
│  │ customers│  │  Name    │ Email    │ Salary │ City       │     │
│  │ products │  │  Naveen  │ nav@... │ 105000 │ Toronto    │     │
│  │ orders   │  │  Shrey   │ shr@... │ 95000  │ Toronto    │     │
│  │         │   │  Vrushab │ vru@... │ 92000  │ Vancouver  │     │
│  └─────────┘   └──────────────────────────────────────────┘     │
│                                                                  │
│  APPLIED STEPS (right panel):                                    │
│  1. Source                                                       │
│  2. Navigation                                                   │
│  3. Removed Columns                   ← Click to see state here  │
│  4. Filtered Rows                                                │
│  5. Renamed Columns                                              │
│  6. Changed Type                                                 │
│                                                                  │
│  RIBBON: Home │ Transform │ Add Column │ View                    │
│  [Choose Columns] [Remove Rows] [Replace Values] [Change Type]  │
└─────────────────────────────────────────────────────────────────┘

Three panels:Queries (left): List of data sources / tables you are working with – Data Preview (center): Shows the data at the current transformation step – Applied Steps (right): Shows every transformation in order — click any step to see the data at that point

Real-life analogy: Applied Steps are like undo history in Photoshop. Each step is a transformation applied to the data. Click any step to see what the data looked like BEFORE or AFTER that step. You can delete, reorder, or insert steps.

Connecting to Data Sources

All Supported Sources (Key Categories)

Category Sources
Fabric Lakehouse, Warehouse, KQL Database, Semantic Model
Azure Azure SQL, Azure Blob, ADLS Gen2, Cosmos DB, Synapse
Databases SQL Server, PostgreSQL, MySQL, Oracle, SAP HANA
Files CSV, Excel, JSON, XML, Parquet, PDF
Online Services SharePoint, Dataverse, Salesforce, Google Analytics
Other REST API (Web), OData, ODBC, Blank Query

Connecting to a Lakehouse Table

  1. In the Power Query Editor, click Get DataMore…
  2. Select Microsoft FabricLakehouse
  3. Select your workspace and lakehouse
  4. Choose the table (e.g., customers)
  5. Click Create → data preview loads

Connecting to Azure SQL Database

  1. Get DataAzureAzure SQL Database
  2. Enter server: server-name.database.windows.net
  3. Enter database: AdventureWorksLT
  4. Authentication: Organizational account or SQL auth
  5. Select tables → Create

Connecting to CSV/Excel Files

  1. Get DataText/CSV or Excel
  2. Browse to file location (OneLake, ADLS, local upload)
  3. Preview the data → adjust delimiter, header row, encoding
  4. Click Create

Basic Transformations (The Essentials)

Choose Columns (Remove Unwanted Columns)

Use case: Source table has 50 columns, you only need 10.

Steps:
1. Home tab → Choose Columns
2. Uncheck columns you do not need
3. Click OK

Or: Right-click a column header → Remove Column
Or: Select multiple columns → Right-click → Remove Other Columns

Remove Rows (Filter)

Use case: Remove test data, filter to specific dates, exclude NULLs.

Steps:
1. Click the dropdown arrow (▼) on any column header
2. Uncheck values you want to exclude
   OR select "Does Not Equal", "Greater Than", "Contains", etc.
3. Click OK

Common filters:
  - Remove nulls: ▼ → uncheck (null)
  - Remove blanks: ▼ → uncheck (blank)
  - Date range: ▼ → Date Filters → After → 2025-01-01
  - Text filter: ▼ → Text Filters → Contains → "Engineering"
  - Number filter: ▼ → Number Filters → Greater Than → 50000

Rename Columns

Steps:
1. Double-click the column header
2. Type the new name
3. Press Enter

Or: Right-click → Rename

Change Data Type

Use case: salary loaded as text, needs to be number. date loaded as string.

Steps:
1. Click the data type icon (left of column header)
   ABC = Text, 123 = Number, 📅 = Date
2. Select the correct type from the dropdown

Or: Select column → Transform tab → Data Type → select type

Replace Values

Use case: Replace "N/A" with null, fix typos, standardize values.

Steps:
1. Select the column
2. Transform tab → Replace Values
3. Value to find: "N/A"
4. Replace with: (leave blank for null)
5. Click OK

Examples:
  "Male" → "M", "Female" → "F"
  "United States" → "USA"
  "  " (spaces) → "" (empty) → then remove blanks

Add Column from Examples

This is one of Power Query’s most powerful features — you type EXAMPLES of what you want, and it figures out the pattern:

Use case: Extract first name from full name "Naveen Vuppula"

Steps:
1. Add Column tab → Column from Examples → From Selection
2. Select the "full_name" column
3. In the new column, type: "Naveen" (next to "Naveen Vuppula")
4. Power Query auto-fills: "Shrey" (next to "Shrey Patil"), etc.
5. If correct, click OK

It figured out: "Take everything before the first space"
No regex needed. No code. Just an example.

Trim, Clean, Uppercase, Lowercase

Steps:
1. Select the column
2. Transform tab → Format → choose:
   - Trim: removes leading/trailing spaces
   - Clean: removes non-printable characters
   - UPPERCASE: "naveen" → "NAVEEN"
   - lowercase: "NAVEEN" → "naveen"
   - Capitalize Each Word: "naveen vuppula" → "Naveen Vuppula"

Split Column

Use case: Split "Toronto, ON, Canada" into City, Province, Country.

Steps:
1. Select the column
2. Transform tab → Split Column → By Delimiter
3. Delimiter: Comma (,)
4. Split at: Each occurrence
5. Click OK
6. Rename the resulting columns: City, Province, Country

Fill Down / Fill Up

Use case: Source data has merged cells where the category only appears once:

| Category    | Product    |
|-------------|-----------|
| Electronics | Laptop    |
| null        | Monitor   |
| null        | Keyboard  |
| Clothing    | T-Shirt   |
| null        | Jeans     |

Steps:
1. Select the Category column
2. Transform tab → Fill → Down

Result:
| Category    | Product    |
|-------------|-----------|
| Electronics | Laptop    |
| Electronics | Monitor   |
| Electronics | Keyboard  |
| Clothing    | T-Shirt   |
| Clothing    | Jeans     |

Remove Duplicates

Steps:
1. Select the column(s) that define uniqueness
2. Home tab → Remove Rows → Remove Duplicates

For composite uniqueness (customer_id + product):
1. Select both columns (Ctrl+click)
2. Remove Duplicates — keeps the first occurrence

Sort Rows

Steps:
1. Click the column header dropdown (▼)
2. Select "Sort Ascending" or "Sort Descending"

Or: Right-click column header → Sort Ascending/Descending

Writing to Destinations

After transforming data, you must configure WHERE the results go.

Lakehouse Table Destination

  1. In the bottom-right corner, click Add data destination (or the + icon)
  2. Select Lakehouse
  3. Choose your workspace → lakehouse → table name
  4. Configure the update method (see below)
  5. Map source columns to destination columns

Warehouse Table Destination

  1. Add data destinationWarehouse
  2. Choose workspace → warehouse → schema → table
  3. Map columns

Update Methods: Replace, Append, Upsert

Method What It Does When to Use
Replace Drops existing data, loads new data (full refresh) Full loads, reference tables, small tables
Append Adds new rows to existing data Incremental loads, log tables, event data

Note: Dataflow Gen2 does NOT support MERGE/Upsert natively as a built-in destination update method. For upsert (SCD) patterns, use a Spark notebook or Warehouse T-SQL MERGE.

Understanding the Data Destination Mapping

Column Mapping Screen:
  Source Column      →    Destination Column
  ─────────────           ──────────────────
  FirstName          →    first_name
  LastName           →    last_name
  EmailAddress       →    email
  AnnualSalary       →    salary
  (auto-mapped by name match, manually adjustable)

If the destination table does not exist, Dataflow Gen2 can CREATE it automatically based on the source schema.

Monitoring Dataflow Gen2 Runs

From the Workspace

  1. Open your workspace
  2. Click the refresh icon next to the Dataflow Gen2 name
  3. Or: right-click → Refresh history → see past runs

From Monitoring Hub

  1. Click Monitor in the left sidebar
  2. Filter by Item type: Dataflow Gen2
  3. See status, duration, start/end time for each run

Inside a Pipeline

When Dataflow Gen2 runs as a pipeline activity, its status shows in the pipeline run details — green (success), red (failed), with duration and error messages.

Common Mistakes

  1. Not setting a data destination — if you skip the destination configuration, the dataflow transforms data but writes it NOWHERE. Always configure a destination before publishing.

  2. Using Replace when Append is needed — Replace deletes all existing data. If you are loading daily incremental data, use Append. Replace is for full refresh only.

  3. Not handling data type mismatches — source column is text, destination expects number. Change the type BEFORE mapping to the destination.

  4. Too many transformations on large data — Dataflow Gen2 is optimized for medium data. If you have 100 million rows and 50 transformation steps, use a Spark notebook instead.

  5. Forgetting to Publish — after building the dataflow, you must click Publish (bottom-right). Without publishing, the dataflow is in draft mode and cannot be run or called from pipelines.

  6. Not previewing before destination — always scroll through the data preview to verify transformations before configuring the destination. Catching errors early saves pipeline failures later.

Interview Questions

Q: What is Dataflow Gen2 in Microsoft Fabric? A: A no-code visual data transformation tool built on the Power Query engine. It connects to 150+ sources, applies transformations via a point-and-click UI, and writes results to Fabric destinations (Lakehouse, Warehouse). It generates M language code behind the scenes. Ideal for simple to medium transformations that analysts can build and maintain.

Q: How does Dataflow Gen2 compare to Spark Notebooks? A: Dataflow Gen2 is no-code (Power Query), good for simple cleaning (filter, rename, type change, dedup). Notebooks are code-based (PySpark), required for complex logic (window functions, Delta MERGE, SCD Type 2, ML). Use Dataflow Gen2 for Bronze-to-Silver cleaning. Use notebooks for Silver-to-Gold enrichment.

Q: What destinations does Dataflow Gen2 support? A: Lakehouse tables, Warehouse tables, Azure SQL Database, and KQL Database. The two update methods are Replace (full refresh) and Append (incremental). Dataflow Gen2 does not support MERGE/upsert natively — use Warehouse T-SQL MERGE or Spark notebook for that.

Wrapping Up

Dataflow Gen2 is the accessible entry point to data transformation in Fabric. No PySpark knowledge needed. No cluster startup delays. Just connect to a source, click transformations, and write to a destination.

In the next post, we cover the advanced transformations: Merge queries (joins), Append queries (unions), Pivot/Unpivot, Group By, conditional columns, custom columns, and error handling.

Related posts:Fabric Data Factory: Pipelines and ActivitiesFabric Foundations: Capacity, Workspaces, ItemsLakehouse vs WarehousePySpark Transformations Cookbook


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link