Dataflow Gen2 Advanced Transformations: Merge Queries, Append, Pivot, Group By, Custom Columns, and Error Handling
In the previous post, we covered the basics — connecting to sources, applying simple transformations (filter, rename, trim), and writing to destinations. Now we go deeper.
This post covers the transformations that turn Dataflow Gen2 from a “rename and filter” tool into a real ETL engine: Merge (joining two tables), Append (stacking tables), Pivot/Unpivot (reshaping), Group By (aggregation), Conditional Columns (CASE WHEN equivalent), Custom Columns (calculated fields), and Error Handling (dealing with bad data).
Think of Part 1 as learning to drive — steering, braking, parking. This post is advanced driving — highway merging (Merge queries), lane switching (Append), U-turns (Pivot), and handling bad weather (Error Handling). Same car, same road, more complex maneuvers.
Table of Contents
- Merge Queries — Joining Two Tables
- The Six Join Types in Power Query
- Merge Step by Step
- Fuzzy Matching
- Append Queries — Stacking Tables
- Append Two Queries
- Append Three or More Queries
- Group By — Aggregation
- Pivot Column — Rows to Columns
- Unpivot Columns — Columns to Rows
- Conditional Column — CASE WHEN Without Code
- Custom Column — Calculated Fields with M Formulas
- Common M Formulas for Custom Columns
- Replace Errors — Handling Bad Data
- Data Profiling — Understanding Your Data Before Transforming
- Column Quality, Distribution, and Profile
- Combining Multiple Transformations — A Complete Example
- When Dataflow Gen2 Reaches Its Limits
- Common Mistakes
- Interview Questions
- Wrapping Up
Merge Queries — Joining Two Tables
Merge is Power Query’s equivalent of a SQL JOIN. You combine two tables based on a matching key.
BEFORE MERGE:
Customers table: Orders table:
| cust_id | name | | order_id | cust_id | amount |
|---------|---------| |----------|---------|--------|
| 1 | Naveen | | 101 | 1 | 999 |
| 2 | Shrey | | 102 | 2 | 29 |
| 3 | Vrushab | | 103 | 1 | 79 |
AFTER MERGE (Left Outer Join on cust_id):
| cust_id | name | order_id | amount |
|---------|---------|----------|--------|
| 1 | Naveen | 101 | 999 |
| 1 | Naveen | 103 | 79 |
| 2 | Shrey | 102 | 29 |
| 3 | Vrushab | null | null |
The Six Join Types in Power Query
| Power Query Name | SQL Equivalent | What It Returns |
|---|---|---|
| Left Outer | LEFT JOIN | All left rows + matching right rows |
| Right Outer | RIGHT JOIN | All right rows + matching left rows |
| Full Outer | FULL OUTER JOIN | All rows from both tables |
| Inner | INNER JOIN | Only rows that match in both tables |
| Left Anti | LEFT JOIN WHERE right IS NULL | Left rows with NO match in right |
| Right Anti | RIGHT JOIN WHERE left IS NULL | Right rows with NO match in left |
Merge Step by Step
- In your Dataflow Gen2, ensure BOTH queries (tables) are loaded in the Queries panel
- Select the primary query (e.g., Customers)
- Home tab → Merge queries → Merge queries as new (creates a new combined query)
- In the Merge dialog:
- Top table: Customers
- Bottom table: Orders
- Click the matching column in EACH table (cust_id in both)
- Join Kind: Left Outer (or whichever you need)
- Click OK
After merge, the Orders data appears as a collapsed table column (a single column containing a nested table). You need to EXPAND it:
- Click the expand icon (↔) on the Orders column header
- Select which columns to expand (order_id, amount)
- Uncheck “Use original column name as prefix” (unless you want “Orders.amount”)
- Click OK
Now you have a flat, joined table.
Real-life analogy: Merge is like combining a class roster (names) with a grade sheet (scores). You match by student ID. After merging, each student row has their name AND their grades. The expand step is like “unfolding” the grade sheet columns next to the name.
Multi-Column Merge
To join on multiple columns (composite key):
- In the Merge dialog, Ctrl+click BOTH columns in the top table
- Ctrl+click the matching columns in the bottom table (in the same order)
- Power Query matches on the combination
Fuzzy Matching
For imperfect data (typos, different formats), Power Query offers fuzzy matching:
- In the Merge dialog, check Use fuzzy matching
- Set the Similarity threshold (0.0 to 1.0) — 0.8 means 80% similar
- Power Query matches “Naveen” with “naveen” or “Naveen V.” with “Naveen Vuppula”
Exact match: "Naveen Vuppula" ↔ "Naveen Vuppula" ✅
Fuzzy match: "Naveen Vuppula" ↔ "Naveen V." ✅ (at 0.7 threshold)
Fuzzy match: "Naveen Vuppula" ↔ "John Smith" ❌ (too different)
Use case: Matching customer names from two systems where spelling varies.
Append Queries — Stacking Tables
Append is Power Query’s equivalent of SQL UNION ALL. It stacks rows from multiple queries vertically.
BEFORE APPEND:
Jan_Sales: Feb_Sales:
| product | amount | | product | amount |
|---------|--------| |---------|--------|
| Laptop | 999 | | Mouse | 29 |
| Monitor | 399 | | Laptop | 999 |
AFTER APPEND:
| product | amount |
|---------|--------|
| Laptop | 999 | ← from Jan_Sales
| Monitor | 399 | ← from Jan_Sales
| Mouse | 29 | ← from Feb_Sales
| Laptop | 999 | ← from Feb_Sales
Append Two Queries
- Select the first query (Jan_Sales)
- Home tab → Append queries → Append queries as new
- Select the second query (Feb_Sales)
- Click OK
Append Three or More Queries
- Append queries as new
- Select Three or more tables
- Add all queries from the Available list to the Tables to append list
- Click OK
Rules: All queries must have the same column names and compatible types. If column names differ, rename them first. If one query has extra columns, they show as NULL in the other.
Group By — Aggregation
Group By is the Power Query equivalent of SQL GROUP BY with aggregate functions.
Step by Step
- Select your query
- Transform tab → Group By
- Choose Basic or Advanced
Basic Group By
Group by: department
New column name: employee_count
Operation: Count Rows
Result:
| department | employee_count |
|---|---|
| Engineering | 3 |
| Analytics | 3 |
| Sales | 4 |
Advanced Group By (Multiple Aggregations)
Click Advanced to add multiple aggregations:
Group by: department
New column: employee_count Operation: Count Rows
New column: total_salary Operation: Sum Column: salary
New column: avg_salary Operation: Average Column: salary
New column: max_salary Operation: Max Column: salary
New column: min_salary Operation: Min Column: salary
| department | employee_count | total_salary | avg_salary | max_salary | min_salary |
|---|---|---|---|---|---|
| Engineering | 3 | 292000 | 97333 | 105000 | 92000 |
| Analytics | 3 | 261000 | 87000 | 91000 | 85000 |
| Sales | 4 | 313000 | 78250 | 82000 | 75000 |
Available aggregation operations: Count Rows, Sum, Average, Min, Max, Count Distinct, All Rows (returns a nested table).
Pivot Column — Rows to Columns
Pivot transforms row values into column headers — identical to SQL PIVOT:
BEFORE PIVOT:
| region | month | revenue |
|--------|---------|---------|
| East | Jan | 150000 |
| East | Feb | 165000 |
| West | Jan | 120000 |
| West | Feb | 135000 |
AFTER PIVOT (pivot month column, aggregate revenue):
| region | Jan | Feb |
|--------|--------|--------|
| East | 150000 | 165000 |
| West | 120000 | 135000 |
Step by Step
- Select the column to pivot (month)
- Transform tab → Pivot Column
- Values Column: revenue (which values fill the cells)
- Advanced → Aggregate: Sum (or Count, Min, Max, Don’t Aggregate)
- Click OK
Unpivot Columns — Columns to Rows
Unpivot is the reverse of Pivot — turns columns back into rows:
BEFORE UNPIVOT:
| region | Q1 | Q2 | Q3 |
|--------|--------|--------|--------|
| East | 495000 | 520000 | 480000 |
| West | 395000 | 410000 | 390000 |
AFTER UNPIVOT (unpivot Q1, Q2, Q3):
| region | Attribute | Value |
|--------|-----------|--------|
| East | Q1 | 495000 |
| East | Q2 | 520000 |
| East | Q3 | 480000 |
| West | Q1 | 395000 |
| West | Q2 | 410000 |
| West | Q3 | 390000 |
Step by Step
- Select the columns to unpivot (Ctrl+click Q1, Q2, Q3)
- Transform tab → Unpivot Columns
- Two new columns appear: Attribute (column names) and Value (cell values)
- Rename them: “Quarter” and “Revenue”
Unpivot Options
| Option | What It Does |
|---|---|
| Unpivot Columns | Unpivot selected columns |
| Unpivot Other Columns | Keep selected columns fixed, unpivot everything else (more flexible — survives new columns) |
| Unpivot Only Selected | Unpivot only the selected columns |
Best practice: Use Unpivot Other Columns — if new quarter columns are added later (Q4), they are automatically unpivoted.
Conditional Column — CASE WHEN Without Code
A Conditional Column is the Power Query equivalent of SQL’s CASE WHEN:
Step by Step
- Add Column tab → Conditional Column
- Name:
salary_band - Build the conditions:
If [salary] >= 100000 Then "Executive"
Else If [salary] >= 85000 Then "Senior"
Else If [salary] >= 70000 Then "Mid-Level"
Else "Junior"
Result: a new salary_band column with values based on the salary ranges.
Multiple Conditions (AND/OR)
If [department] = "Engineering" AND [salary] >= 100000 Then "Lead Engineer"
Else If [department] = "Engineering" Then "Engineer"
Else If [department] = "Sales" AND [salary] >= 80000 Then "Senior Sales"
Else "Staff"
Custom Column — Calculated Fields with M Formulas
For calculations more complex than conditional logic, use Custom Columns with M formulas:
Step by Step
- Add Column tab → Custom Column
- Name:
annual_salary - Formula:
[salary] * 12 - Click OK
Common M Formulas for Custom Columns
// Arithmetic
[salary] * 12 → Annual salary
[amount] * (1 - [discount_pct] / 100) → Discounted amount
[revenue] - [cost] → Profit
// String operations
Text.Upper([name]) → "NAVEEN"
Text.Lower([email]) → "naveen@email.com"
Text.Trim([name]) → Remove spaces
Text.Start([phone], 3) → First 3 characters "416"
Text.End([phone], 4) → Last 4 characters "1234"
Text.Combine({[first_name], " ", [last_name]}) → "Naveen Vuppula"
Text.Contains([email], "@") → true/false
Text.Replace([phone], "-", "") → Remove dashes
// Date operations
Date.Year([hire_date]) → 2022
Date.Month([hire_date]) → 1
Date.Day([hire_date]) → 15
Date.DayOfWeek([hire_date]) → 6 (Saturday)
Duration.Days(DateTime.LocalNow() - [hire_date]) → Days since hire
Date.AddMonths([hire_date], 6) → 6 months after hire
Date.StartOfMonth([hire_date]) → First day of month
Date.EndOfMonth([hire_date]) → Last day of month
// Null handling
if [email] = null then "No Email" else [email] → COALESCE equivalent
[salary] ?? 0 → Null coalesce shorthand (if null, use 0)
// Conditional (inline if)
if [salary] >= 100000 then "High" else "Normal" → Simple condition
if [country] = "Canada" then [salary] * 0.25
else if [country] = "India" then [salary] * 0.30
else [salary] * 0.20 → Tax calculation by country
// Type conversion
Number.FromText([salary_text]) → Text to number
Text.From([emp_id]) → Number to text
Date.FromText([date_string]) → Text to date
Replace Errors — Handling Bad Data
When a transformation fails for specific rows (type conversion errors, division by zero), Power Query marks them as errors instead of crashing the entire dataflow.
Viewing Errors
- After a transformation, look for red cells in the preview — these are errors
- Click the red cell to see the error message
- Or: click the error count at the bottom of a column header
Replacing Errors
Steps:
1. Select the column with errors
2. Transform tab → Replace Errors
3. Enter the replacement value (e.g., 0, null, "Unknown")
4. Click OK
All error values in that column are replaced with your specified value.
Removing Error Rows
Steps:
1. Select the column with errors
2. Home tab → Remove Rows → Remove Errors
3. All rows with errors in that column are removed
Preventing Errors with try…otherwise
In custom columns, use try...otherwise to handle errors gracefully:
// Safe division (prevents divide by zero)
try [revenue] / [cost] otherwise null
// Safe type conversion
try Number.FromText([salary_text]) otherwise 0
// Safe date parsing
try Date.FromText([date_string]) otherwise null
Real-life analogy: Error handling is like quality control on an assembly line. Some products (rows) have defects (errors). You can fix them (Replace Errors), remove them (Remove Errors), or wrap the risky step in bubble wrap (try…otherwise) so it does not break the line if one product is defective.
Data Profiling — Understanding Your Data Before Transforming
Power Query has built-in data profiling tools that help you understand your data before you start transforming:
Enable Data Profiling
- View tab → check Column quality
- View tab → check Column distribution
- View tab → check Column profile
Column Quality
Shows the percentage of Valid, Error, and Empty values per column:
Column: email
Valid: 90% ████████████████████
Error: 0%
Empty: 10% ██
Column Distribution
Shows unique values and distinct values per column:
Column: department
Distinct: 3 (Engineering, Analytics, Sales)
Unique: 0 (no value appears only once)
Column Profile
Select a column to see detailed statistics at the bottom: min, max, average, count, distinct count, null count, value distribution histogram.
Best practice: ALWAYS enable data profiling before building transformations. It reveals NULLs, duplicates, outliers, and data type issues before they cause problems downstream.
Important: By default, profiling analyzes only the first 1,000 rows. Click the status bar at the bottom and change to “Column profiling based on entire data set” for accurate results.
Combining Multiple Transformations — A Complete Example
Here is a real Bronze-to-Silver cleaning dataflow:
Source: Lakehouse bronze_lakehouse / customers table
Step 1: Choose Columns
Keep: customer_id, first_name, last_name, email, phone, city, country, salary, hire_date
Remove: middle_name, suffix, internal_notes, temp_flag
Step 2: Remove Rows
Filter: Remove rows where customer_id is null
Step 3: Trim + Capitalize
first_name: Trim → Capitalize Each Word
last_name: Trim → Capitalize Each Word
email: Trim → Lowercase
Step 4: Replace Values
country: Replace "United States" with "USA"
country: Replace "United Kingdom" with "UK"
Step 5: Conditional Column (salary_band)
If salary >= 100000 Then "Executive"
Else If salary >= 80000 Then "Senior"
Else "Standard"
Step 6: Custom Column (full_name)
Formula: Text.Combine({[first_name], " ", [last_name]})
Step 7: Change Data Types
salary: Decimal Number
hire_date: Date
customer_id: Whole Number
Step 8: Remove Duplicates
On: customer_id (keep first occurrence)
Step 9: Replace Errors
salary: Replace errors with 0
hire_date: Replace errors with null
Destination: Lakehouse silver_lakehouse / customers_clean (Replace mode)
9 steps, zero code, production-ready cleaning.
When Dataflow Gen2 Reaches Its Limits
Use a Spark notebook instead when you need:
- Delta MERGE / SCD Type 2 — Dataflow Gen2 cannot do upserts
- Window functions (ROW_NUMBER, LAG, LEAD) — not available in Power Query
- Processing billions of rows — Power Query is not distributed like Spark
- Custom Python libraries — Dataflow Gen2 does not support Python
- Complex conditional logic — deeply nested if/then/else is cleaner in PySpark
- Writing to multiple destinations from one transformation — each Dataflow query writes to one destination
Common Mistakes
-
Not expanding merged columns — after Merge, the joined data is a collapsed table column. You MUST click the expand icon to flatten it.
-
Profiling only first 1,000 rows — the default profiling sample misses issues in later rows. Change to “entire data set.”
-
Not using Unpivot Other Columns — using regular Unpivot breaks when new columns are added. Unpivot Other Columns is future-proof.
-
Applying transformations in the wrong order — filter BEFORE type conversion (reduces data volume first). Remove duplicates AFTER cleaning (so cleaned values match correctly).
-
Not handling errors before destination — error rows fail silently or block the destination load. Always Replace Errors or Remove Errors before the destination step.
Interview Questions
Q: What is the difference between Merge and Append in Dataflow Gen2? A: Merge joins two queries horizontally based on a matching key (like SQL JOIN). Append stacks two queries vertically (like SQL UNION ALL). Merge adds columns. Append adds rows.
Q: How do you handle joins in Dataflow Gen2? A: Using the Merge Queries feature. Select the matching columns in both tables, choose the join type (Left Outer, Inner, Full Outer, Left Anti, Right Anti), then expand the resulting table column to flatten the joined data.
Q: What is Pivot vs Unpivot? A: Pivot transforms row values into column headers (long to wide). Unpivot transforms columns back into rows (wide to long). Use Unpivot Other Columns for future-proof unpivoting that survives new column additions.
Wrapping Up
Dataflow Gen2 is far more than a rename-and-filter tool. Merge gives you JOINs. Append gives you UNION. Group By gives you aggregation. Conditional Columns give you CASE WHEN. Custom Columns give you calculated fields. And error handling gives you data quality control — all without writing code.
In the next post, we cover using Dataflow Gen2 in production: pipeline integration, parameterization, incremental refresh, performance optimization, and the complete decision guide for when to use Dataflow Gen2 vs Notebooks.
Related posts: – Dataflow Gen2 Introduction – Fabric Data Factory: Pipelines – SQL Joins – GROUP BY, HAVING, CASE WHEN
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.