Dataflow Gen2 Advanced Transformations: Merge Queries, Append, Pivot, Group By, Custom Columns, and Error Handling

In the previous post, we covered the basics — connecting to sources, applying simple transformations (filter, rename, trim), and writing to destinations. Now we go deeper.

This post covers the transformations that turn Dataflow Gen2 from a “rename and filter” tool into a real ETL engine: Merge (joining two tables), Append (stacking tables), Pivot/Unpivot (reshaping), Group By (aggregation), Conditional Columns (CASE WHEN equivalent), Custom Columns (calculated fields), and Error Handling (dealing with bad data).

Think of Part 1 as learning to drive — steering, braking, parking. This post is advanced driving — highway merging (Merge queries), lane switching (Append), U-turns (Pivot), and handling bad weather (Error Handling). Same car, same road, more complex maneuvers.

Merge Queries — Joining Two Tables
The Six Join Types in Power Query
Merge Step by Step
Fuzzy Matching
Append Queries — Stacking Tables
Append Two Queries
Append Three or More Queries
Group By — Aggregation
Pivot Column — Rows to Columns
Unpivot Columns — Columns to Rows
Conditional Column — CASE WHEN Without Code
Custom Column — Calculated Fields with M Formulas
Common M Formulas for Custom Columns
Replace Errors — Handling Bad Data
Data Profiling — Understanding Your Data Before Transforming
Column Quality, Distribution, and Profile
Combining Multiple Transformations — A Complete Example
When Dataflow Gen2 Reaches Its Limits
Common Mistakes
Interview Questions
Wrapping Up

Merge Queries — Joining Two Tables

Merge is Power Query’s equivalent of a SQL JOIN. You combine two tables based on a matching key.

BEFORE MERGE:
  Customers table:                 Orders table:
  | cust_id | name    |           | order_id | cust_id | amount |
  |---------|---------|           |----------|---------|--------|
  | 1       | Naveen  |           | 101      | 1       | 999    |
  | 2       | Shrey   |           | 102      | 2       | 29     |
  | 3       | Vrushab |           | 103      | 1       | 79     |

AFTER MERGE (Left Outer Join on cust_id):
  | cust_id | name    | order_id | amount |
  |---------|---------|----------|--------|
  | 1       | Naveen  | 101      | 999    |
  | 1       | Naveen  | 103      | 79     |
  | 2       | Shrey   | 102      | 29     |
  | 3       | Vrushab | null     | null   |

The Six Join Types in Power Query

Power Query Name	SQL Equivalent	What It Returns
Left Outer	LEFT JOIN	All left rows + matching right rows
Right Outer	RIGHT JOIN	All right rows + matching left rows
Full Outer	FULL OUTER JOIN	All rows from both tables
Inner	INNER JOIN	Only rows that match in both tables
Left Anti	LEFT JOIN WHERE right IS NULL	Left rows with NO match in right
Right Anti	RIGHT JOIN WHERE left IS NULL	Right rows with NO match in left

Merge Step by Step

In your Dataflow Gen2, ensure BOTH queries (tables) are loaded in the Queries panel
Select the primary query (e.g., Customers)
Home tab → Merge queries → Merge queries as new (creates a new combined query)
In the Merge dialog:
Top table: Customers
Bottom table: Orders
Click the matching column in EACH table (cust_id in both)
Join Kind: Left Outer (or whichever you need)
Click OK

After merge, the Orders data appears as a collapsed table column (a single column containing a nested table). You need to EXPAND it:

Click the expand icon (↔) on the Orders column header
Select which columns to expand (order_id, amount)
Uncheck “Use original column name as prefix” (unless you want “Orders.amount”)
Click OK

Now you have a flat, joined table.

Real-life analogy: Merge is like combining a class roster (names) with a grade sheet (scores). You match by student ID. After merging, each student row has their name AND their grades. The expand step is like “unfolding” the grade sheet columns next to the name.

Multi-Column Merge

To join on multiple columns (composite key):

In the Merge dialog, Ctrl+click BOTH columns in the top table
Ctrl+click the matching columns in the bottom table (in the same order)
Power Query matches on the combination

Fuzzy Matching

For imperfect data (typos, different formats), Power Query offers fuzzy matching:

In the Merge dialog, check Use fuzzy matching
Set the Similarity threshold (0.0 to 1.0) — 0.8 means 80% similar
Power Query matches “Naveen” with “naveen” or “Naveen V.” with “Naveen Vuppula”

Exact match: "Naveen Vuppula" ↔ "Naveen Vuppula" ✅
Fuzzy match:  "Naveen Vuppula" ↔ "Naveen V."      ✅ (at 0.7 threshold)
Fuzzy match:  "Naveen Vuppula" ↔ "John Smith"      ❌ (too different)

Use case: Matching customer names from two systems where spelling varies.

Append Queries — Stacking Tables

Append is Power Query’s equivalent of SQL UNION ALL. It stacks rows from multiple queries vertically.

BEFORE APPEND:
  Jan_Sales:                    Feb_Sales:
  | product | amount |          | product | amount |
  |---------|--------|          |---------|--------|
  | Laptop  | 999    |          | Mouse   | 29     |
  | Monitor | 399    |          | Laptop  | 999    |

AFTER APPEND:
  | product | amount |
  |---------|--------|
  | Laptop  | 999    |   ← from Jan_Sales
  | Monitor | 399    |   ← from Jan_Sales
  | Mouse   | 29     |   ← from Feb_Sales
  | Laptop  | 999    |   ← from Feb_Sales

Append Two Queries

Select the first query (Jan_Sales)
Home tab → Append queries → Append queries as new
Select the second query (Feb_Sales)
Click OK

Append Three or More Queries

Append queries as new
Select Three or more tables
Add all queries from the Available list to the Tables to append list
Click OK

Rules: All queries must have the same column names and compatible types. If column names differ, rename them first. If one query has extra columns, they show as NULL in the other.

Group By — Aggregation

Group By is the Power Query equivalent of SQL GROUP BY with aggregate functions.

Step by Step

Select your query
Transform tab → Group By
Choose Basic or Advanced

Basic Group By

Group by: department
New column name: employee_count
Operation: Count Rows

Result:

department	employee_count
Engineering	3
Analytics	3
Sales	4

Advanced Group By (Multiple Aggregations)

Click Advanced to add multiple aggregations:

Group by: department
  New column: employee_count     Operation: Count Rows
  New column: total_salary       Operation: Sum        Column: salary
  New column: avg_salary         Operation: Average    Column: salary
  New column: max_salary         Operation: Max        Column: salary
  New column: min_salary         Operation: Min        Column: salary

department	employee_count	total_salary	avg_salary	max_salary	min_salary
Engineering	3	292000	97333	105000	92000
Analytics	3	261000	87000	91000	85000
Sales	4	313000	78250	82000	75000

Available aggregation operations: Count Rows, Sum, Average, Min, Max, Count Distinct, All Rows (returns a nested table).

Pivot Column — Rows to Columns

Pivot transforms row values into column headers — identical to SQL PIVOT:

BEFORE PIVOT:
  | region | month   | revenue |
  |--------|---------|---------|
  | East   | Jan     | 150000  |
  | East   | Feb     | 165000  |
  | West   | Jan     | 120000  |
  | West   | Feb     | 135000  |

AFTER PIVOT (pivot month column, aggregate revenue):
  | region | Jan    | Feb    |
  |--------|--------|--------|
  | East   | 150000 | 165000 |
  | West   | 120000 | 135000 |

Step by Step

Select the column to pivot (month)
Transform tab → Pivot Column
Values Column: revenue (which values fill the cells)
Advanced → Aggregate: Sum (or Count, Min, Max, Don’t Aggregate)
Click OK

Unpivot Columns — Columns to Rows

Unpivot is the reverse of Pivot — turns columns back into rows:

BEFORE UNPIVOT:
  | region | Q1     | Q2     | Q3     |
  |--------|--------|--------|--------|
  | East   | 495000 | 520000 | 480000 |
  | West   | 395000 | 410000 | 390000 |

AFTER UNPIVOT (unpivot Q1, Q2, Q3):
  | region | Attribute | Value  |
  |--------|-----------|--------|
  | East   | Q1        | 495000 |
  | East   | Q2        | 520000 |
  | East   | Q3        | 480000 |
  | West   | Q1        | 395000 |
  | West   | Q2        | 410000 |
  | West   | Q3        | 390000 |

Step by Step

Select the columns to unpivot (Ctrl+click Q1, Q2, Q3)
Transform tab → Unpivot Columns
Two new columns appear: Attribute (column names) and Value (cell values)
Rename them: “Quarter” and “Revenue”

Unpivot Options

Option	What It Does
Unpivot Columns	Unpivot selected columns
Unpivot Other Columns	Keep selected columns fixed, unpivot everything else (more flexible — survives new columns)
Unpivot Only Selected	Unpivot only the selected columns

Best practice: Use Unpivot Other Columns — if new quarter columns are added later (Q4), they are automatically unpivoted.

Conditional Column — CASE WHEN Without Code

A Conditional Column is the Power Query equivalent of SQL’s CASE WHEN:

Step by Step

Add Column tab → Conditional Column
Name: salary_band
Build the conditions:

If [salary] >= 100000    Then "Executive"
Else If [salary] >= 85000   Then "Senior"
Else If [salary] >= 70000   Then "Mid-Level"
Else "Junior"

Result: a new salary_band column with values based on the salary ranges.

Multiple Conditions (AND/OR)

If [department] = "Engineering" AND [salary] >= 100000   Then "Lead Engineer"
Else If [department] = "Engineering"                       Then "Engineer"
Else If [department] = "Sales" AND [salary] >= 80000      Then "Senior Sales"
Else "Staff"

Custom Column — Calculated Fields with M Formulas

For calculations more complex than conditional logic, use Custom Columns with M formulas:

Step by Step

Add Column tab → Custom Column
Name: annual_salary
Formula: [salary] * 12
Click OK

Common M Formulas for Custom Columns

// Arithmetic
[salary] * 12                                    → Annual salary
[amount] * (1 - [discount_pct] / 100)           → Discounted amount
[revenue] - [cost]                                → Profit

// String operations
Text.Upper([name])                                → "NAVEEN"
Text.Lower([email])                               → "naveen@email.com"
Text.Trim([name])                                 → Remove spaces
Text.Start([phone], 3)                            → First 3 characters "416"
Text.End([phone], 4)                              → Last 4 characters "1234"
Text.Combine({[first_name], " ", [last_name]})    → "Naveen Vuppula"
Text.Contains([email], "@")                       → true/false
Text.Replace([phone], "-", "")                    → Remove dashes

// Date operations
Date.Year([hire_date])                            → 2022
Date.Month([hire_date])                           → 1
Date.Day([hire_date])                             → 15
Date.DayOfWeek([hire_date])                       → 6 (Saturday)
Duration.Days(DateTime.LocalNow() - [hire_date])  → Days since hire
Date.AddMonths([hire_date], 6)                    → 6 months after hire
Date.StartOfMonth([hire_date])                    → First day of month
Date.EndOfMonth([hire_date])                      → Last day of month

// Null handling
if [email] = null then "No Email" else [email]    → COALESCE equivalent
[salary] ?? 0                                      → Null coalesce shorthand (if null, use 0)

// Conditional (inline if)
if [salary] >= 100000 then "High" else "Normal"   → Simple condition
if [country] = "Canada" then [salary] * 0.25 
    else if [country] = "India" then [salary] * 0.30
    else [salary] * 0.20                           → Tax calculation by country

// Type conversion
Number.FromText([salary_text])                    → Text to number
Text.From([emp_id])                               → Number to text
Date.FromText([date_string])                      → Text to date

Replace Errors — Handling Bad Data

When a transformation fails for specific rows (type conversion errors, division by zero), Power Query marks them as errors instead of crashing the entire dataflow.

Viewing Errors

After a transformation, look for red cells in the preview — these are errors
Click the red cell to see the error message
Or: click the error count at the bottom of a column header

Replacing Errors

Steps:
1. Select the column with errors
2. Transform tab → Replace Errors
3. Enter the replacement value (e.g., 0, null, "Unknown")
4. Click OK

All error values in that column are replaced with your specified value.

Removing Error Rows

Steps:
1. Select the column with errors
2. Home tab → Remove Rows → Remove Errors
3. All rows with errors in that column are removed

Preventing Errors with try…otherwise

In custom columns, use try...otherwise to handle errors gracefully:

// Safe division (prevents divide by zero)
try [revenue] / [cost] otherwise null

// Safe type conversion
try Number.FromText([salary_text]) otherwise 0

// Safe date parsing
try Date.FromText([date_string]) otherwise null

Real-life analogy: Error handling is like quality control on an assembly line. Some products (rows) have defects (errors). You can fix them (Replace Errors), remove them (Remove Errors), or wrap the risky step in bubble wrap (try…otherwise) so it does not break the line if one product is defective.

Data Profiling — Understanding Your Data Before Transforming

Power Query has built-in data profiling tools that help you understand your data before you start transforming:

Enable Data Profiling

View tab → check Column quality
View tab → check Column distribution
View tab → check Column profile

Column Quality, Distribution, and Profile

These three profiling views work together to give you a complete picture of each column’s health:

Column Quality

Shows the percentage of Valid, Error, and Empty values per column:

Column: email
  Valid: 90%  ████████████████████
  Error: 0%
  Empty: 10%  ██

Column Distribution

Shows unique values and distinct values per column:

Column: department
  Distinct: 3  (Engineering, Analytics, Sales)
  Unique: 0    (no value appears only once)

Column Profile

Select a column to see detailed statistics at the bottom: min, max, average, count, distinct count, null count, value distribution histogram.

Best practice: ALWAYS enable data profiling before building transformations. It reveals NULLs, duplicates, outliers, and data type issues before they cause problems downstream.

Important: By default, profiling analyzes only the first 1,000 rows. Click the status bar at the bottom and change to “Column profiling based on entire data set” for accurate results.

Combining Multiple Transformations — A Complete Example

Here is a real Bronze-to-Silver cleaning dataflow:

Source: Lakehouse bronze_lakehouse / customers table

Step 1: Choose Columns
  Keep: customer_id, first_name, last_name, email, phone, city, country, salary, hire_date
  Remove: middle_name, suffix, internal_notes, temp_flag

Step 2: Remove Rows
  Filter: Remove rows where customer_id is null

Step 3: Trim + Capitalize
  first_name: Trim → Capitalize Each Word
  last_name: Trim → Capitalize Each Word
  email: Trim → Lowercase

Step 4: Replace Values
  country: Replace "United States" with "USA"
  country: Replace "United Kingdom" with "UK"

Step 5: Conditional Column (salary_band)
  If salary >= 100000 Then "Executive"
  Else If salary >= 80000 Then "Senior"
  Else "Standard"

Step 6: Custom Column (full_name)
  Formula: Text.Combine({[first_name], " ", [last_name]})

Step 7: Change Data Types
  salary: Decimal Number
  hire_date: Date
  customer_id: Whole Number

Step 8: Remove Duplicates
  On: customer_id (keep first occurrence)

Step 9: Replace Errors
  salary: Replace errors with 0
  hire_date: Replace errors with null

Destination: Lakehouse silver_lakehouse / customers_clean (Replace mode)

9 steps, zero code, production-ready cleaning.

When Dataflow Gen2 Reaches Its Limits

Use a Spark notebook instead when you need:

Delta MERGE / SCD Type 2 — Dataflow Gen2 cannot do upserts
Window functions (ROW_NUMBER, LAG, LEAD) — not available in Power Query
Processing billions of rows — Power Query is not distributed like Spark
Custom Python libraries — Dataflow Gen2 does not support Python
Complex conditional logic — deeply nested if/then/else is cleaner in PySpark
Writing to multiple destinations from one transformation — each Dataflow query writes to one destination

Common Mistakes

Not expanding merged columns — after Merge, the joined data is a collapsed table column. You MUST click the expand icon to flatten it.
Profiling only first 1,000 rows — the default profiling sample misses issues in later rows. Change to “entire data set.”
Not using Unpivot Other Columns — using regular Unpivot breaks when new columns are added. Unpivot Other Columns is future-proof.
Applying transformations in the wrong order — filter BEFORE type conversion (reduces data volume first). Remove duplicates AFTER cleaning (so cleaned values match correctly).
Not handling errors before destination — error rows fail silently or block the destination load. Always Replace Errors or Remove Errors before the destination step.

Interview Questions

Q: What is the difference between Merge and Append in Dataflow Gen2? A: Merge joins two queries horizontally based on a matching key (like SQL JOIN). Append stacks two queries vertically (like SQL UNION ALL). Merge adds columns. Append adds rows.

Q: How do you handle joins in Dataflow Gen2? A: Using the Merge Queries feature. Select the matching columns in both tables, choose the join type (Left Outer, Inner, Full Outer, Left Anti, Right Anti), then expand the resulting table column to flatten the joined data.

Q: What is Pivot vs Unpivot? A: Pivot transforms row values into column headers (long to wide). Unpivot transforms columns back into rows (wide to long). Use Unpivot Other Columns for future-proof unpivoting that survives new column additions.

Wrapping Up

Dataflow Gen2 is far more than a rename-and-filter tool. Merge gives you JOINs. Append gives you UNION. Group By gives you aggregation. Conditional Columns give you CASE WHEN. Custom Columns give you calculated fields. And error handling gives you data quality control — all without writing code.

In the next post, we cover using Dataflow Gen2 in production: pipeline integration, parameterization, incremental refresh, performance optimization, and the complete decision guide for when to use Dataflow Gen2 vs Notebooks.

← Previous: Dataflow Gen2: Introduction

Fabric (14/34)

Next: Dataflow Gen2: Production Patterns →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

← Dataflow Gen2: Introduction Dataflow Gen2: Production Patterns →

Dataflow Gen2 Advanced Transformations: Merge Queries, Append, Pivot, Group By, Custom Columns, and Error Handling

Table of Contents

Merge Queries — Joining Two Tables

The Six Join Types in Power Query

Merge Step by Step

Multi-Column Merge

Fuzzy Matching

Append Queries — Stacking Tables

Append Two Queries

Append Three or More Queries

Group By — Aggregation

Step by Step

Basic Group By

Advanced Group By (Multiple Aggregations)

Pivot Column — Rows to Columns

Step by Step

Unpivot Columns — Columns to Rows

Step by Step

Unpivot Options

Conditional Column — CASE WHEN Without Code

Step by Step

Multiple Conditions (AND/OR)

Custom Column — Calculated Fields with M Formulas

Step by Step

Common M Formulas for Custom Columns

Replace Errors — Handling Bad Data

Viewing Errors

Replacing Errors

Removing Error Rows

Preventing Errors with try…otherwise

Data Profiling — Understanding Your Data Before Transforming

Enable Data Profiling

Column Quality, Distribution, and Profile

Column Quality

Column Distribution

Column Profile

Combining Multiple Transformations — A Complete Example

When Dataflow Gen2 Reaches Its Limits

Common Mistakes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply