Slowly Changing Dimensions (SCD) in Azure: Types 0, 1, 2, 3, and 6 with Detailed Examples
Imagine you have a customer named Alice who lives in Toronto. She has been your customer since 2022. In January 2026, she moves to Mumbai. Simple question: what city does Alice live in?
If you say “Mumbai,” you are correct TODAY. But what about the orders she placed in 2023 while living in Toronto? Should those orders show Toronto or Mumbai?
THIS is the Slowly Changing Dimension (SCD) problem. And depending on how you handle it, you get completely different answers to business questions. Get it wrong, and your sales reports, customer analytics, and compliance data are all incorrect.
This post covers every SCD type with detailed table examples, SQL implementation in Azure, and the real-world scenarios where each type is the right choice.
Table of Contents
- What Is a Slowly Changing Dimension?
- The Star Schema Context
- Our Running Example: Alice Moves to Mumbai
- SCD Type 0: Fixed (Never Changes)
- SCD Type 1: Overwrite (Lose History)
- SCD Type 2: Add New Row (Full History)
- SCD Type 3: Add New Column (Limited History)
- SCD Type 6: Hybrid (1 + 2 + 3 Combined)
- Side-by-Side Comparison
- Implementing SCD Type 2 in Azure SQL Database
- Implementing SCD Type 2 in Synapse Data Flows
- Implementing SCD Type 2 with Delta Lake MERGE
- SCD in the Medallion Architecture
- Handling Multiple Changes
- Edge Cases and Gotchas
- Real-World Scenarios
- Interview Questions
- Wrapping Up
What Is a Slowly Changing Dimension?
A dimension in a data warehouse is a table that describes the WHO, WHAT, WHERE of your business: customers, products, stores, employees. These dimensions change slowly over time — a customer moves cities, a product changes categories, an employee gets promoted.
The question is: how do you handle these changes in your data warehouse?
The answer depends on your SCD type.
Real-life analogy: Think of your phone contacts. When a friend changes their phone number, do you: – Overwrite the old number (SCD Type 1) — you can call them now but lost their old number – Add a new contact entry with the new number and mark the old one as “previous” (SCD Type 2) — full history – Add a “previous phone” field to the same contact (SCD Type 3) — one level of history
Each approach serves different needs.
The Star Schema Context
SCDs live in dimensional models (star schemas):
dim_customer (SCD applies here)
|
dim_date --- fact_orders --- dim_product
|
dim_store
Fact table (fact_orders): records events — order_id, amount, date_key, customer_key. Millions of rows. Append-only.
Dimension table (dim_customer): describes entities — customer_key, name, city, segment. Thousands of rows. Changes over time.
The customer_key in the fact table points to a row in the dimension table. SCD determines what happens to that row when the customer’s data changes.
Our Running Example: Alice Moves to Mumbai
Starting state (dimension table):
| customer_key | customer_id | name | city | segment |
|---|---|---|---|---|
| 1001 | C100 | Alice | Toronto | Premium |
| 1002 | C200 | Bob | London | Standard |
| 1003 | C300 | Carol | Mumbai | Premium |
The change: On January 15, 2026, Alice moves from Toronto to Mumbai.
Let us see how each SCD type handles this.
SCD Type 0: Fixed (Never Changes)
The Rule
Once a dimension record is created, it NEVER changes. The original value is preserved forever. Any updates from the source are ignored.
After Alice Moves
| customer_key | customer_id | name | city | segment |
|---|---|---|---|---|
| 1001 | C100 | Alice | Toronto | Premium |
| 1002 | C200 | Bob | London | Standard |
| 1003 | C300 | Carol | Mumbai | Premium |
City stays Toronto. The change is ignored completely.
When to Use Type 0
- Date of birth — never changes (even if the source system “corrects” it)
- Original signup date — the date the customer first registered
- Social Security Number — once assigned, never modified
- Regulatory records — where the original value must be preserved for legal reasons
Real-life analogy: Your birth certificate. Even if your name changes legally, the birth certificate still shows your original birth name. It is a permanent historical record.
SCD Type 1: Overwrite (Lose History)
The Rule
The old value is overwritten with the new value. No history is kept. The dimension always reflects the CURRENT state.
After Alice Moves
| customer_key | customer_id | name | city | segment |
|---|---|---|---|---|
| 1001 | C100 | Alice | Mumbai | Premium |
| 1002 | C200 | Bob | London | Standard |
| 1003 | C300 | Carol | Mumbai | Premium |
Toronto is gone. Replaced with Mumbai. If you query Alice’s orders from 2023, they now show Mumbai (even though she was in Toronto when she placed them).
SQL Implementation
UPDATE dim_customer
SET city = 'Mumbai',
last_updated = GETDATE()
WHERE customer_id = 'C100';
When to Use Type 1
- Corrections — fixing a typo (city was “Tronto” and corrected to “Toronto”)
- Non-critical attributes — phone number, email, where history does not matter
- Real-time dashboards — you want the current state only
- Simple reporting — “how many customers are in each city RIGHT NOW?”
The Problem with Type 1
Fact: Alice ordered $500 on 2023-06-15 (she was in Toronto)
Dim: Alice's city is now Mumbai
Report: "Revenue by City, 2023"
Mumbai: $500 (WRONG -- she was in Toronto when she ordered)
Toronto: $0 (WRONG -- this is where the order actually happened)
History is rewritten. Past reports change retroactively. This is why Type 1 is only appropriate when history does not matter.
Real-life analogy: Type 1 is like updating your GPS address. You change your home address and the GPS forgets you ever lived anywhere else. Good for navigation, bad for proving where you lived in 2023.
SCD Type 2: Add New Row (Full History)
The Rule
When a change occurs, the old record is “closed” (end-dated) and a new record is inserted with the new values. Both versions coexist in the table. Full history is preserved.
After Alice Moves
| customer_key | customer_id | name | city | effective_from | effective_to | is_current |
|---|---|---|---|---|---|---|
| 1001 | C100 | Alice | Toronto | 2022-03-15 | 2026-01-14 | 0 |
| 1004 | C100 | Alice | Mumbai | 2026-01-15 | 9999-12-31 | 1 |
| 1002 | C200 | Bob | London | 2023-01-01 | 9999-12-31 | 1 |
| 1003 | C300 | Carol | Mumbai | 2024-06-01 | 9999-12-31 | 1 |
Key observations:
– Alice now has TWO rows (customer_key 1001 and 1004)
– Same customer_id (C100) but different customer_key (surrogate key)
– Row 1001 has effective_to = 2026-01-14 and is_current = 0 (historical)
– Row 1004 has effective_to = 9999-12-31 and is_current = 1 (current)
– 9999-12-31 is a convention meaning “still active”
Why Surrogate Keys Matter
Notice customer_key is different from customer_id:
- customer_id = business key (C100). Stays the same across versions. Identifies the PERSON.
- customer_key = surrogate key (1001, 1004). Auto-generated. Identifies the VERSION.
Fact table references customer_key, not customer_id. This is how you link an order to the CORRECT version of the customer:
fact_orders:
order_id=501, customer_key=1001, amount=500, date=2023-06-15
(Links to Alice in Toronto -- CORRECT for 2023)
order_id=720, customer_key=1004, amount=300, date=2026-02-10
(Links to Alice in Mumbai -- CORRECT for 2026)
Querying SCD Type 2
Current state (like Type 1):
SELECT * FROM dim_customer
WHERE is_current = 1;
Historical state (point-in-time):
-- Where was Alice on 2023-06-15?
SELECT * FROM dim_customer
WHERE customer_id = 'C100'
AND '2023-06-15' >= effective_from
AND '2023-06-15' < effective_to;
-- Result: Toronto (customer_key=1001)
All versions of a customer:
SELECT * FROM dim_customer
WHERE customer_id = 'C100'
ORDER BY effective_from;
-- Result: Toronto (2022-2026), Mumbai (2026-present)
When to Use Type 2
- Regulatory compliance — must track where customers lived when they transacted (GDPR, financial audits)
- Accurate historical reporting — revenue by city should reflect the city AT THE TIME of the order
- Customer journey analysis — track how customer segments change over time
- Any scenario where “what was the state at that point in time?” matters
Real-life analogy: Type 2 is like your passport history. When you renew your passport, the old one is cancelled (end-dated) and a new one is issued. Immigration can see where you traveled with EACH passport. Complete history preserved.
SCD Type 3: Add New Column (Limited History)
The Rule
Add a column for the previous value. You keep ONE level of history but not the full timeline.
After Alice Moves
| customer_key | customer_id | name | current_city | previous_city | city_changed_date |
|---|---|---|---|---|---|
| 1001 | C100 | Alice | Mumbai | Toronto | 2026-01-15 |
| 1002 | C200 | Bob | London | NULL | NULL |
| 1003 | C300 | Carol | Mumbai | NULL | NULL |
SQL Implementation
UPDATE dim_customer
SET previous_city = current_city, -- Save old value
current_city = 'Mumbai', -- Update to new value
city_changed_date = '2026-01-15'
WHERE customer_id = 'C100';
The Limitation
What if Alice moves AGAIN to Paris in 2027?
| customer_key | customer_id | current_city | previous_city |
|---|---|---|---|
| 1001 | C100 | Paris | Mumbai |
Toronto is gone. Type 3 only remembers one previous value. You lose the Toronto history.
When to Use Type 3
- You need to compare current vs previous values (e.g., “did the customer’s segment change?”)
- You only care about one level of change (current and immediately prior)
- The attribute rarely changes (maybe once in the customer’s lifetime)
Real-life analogy: Type 3 is like your phone’s “Recent Calls” showing only the last missed call. You know who called most recently, but not who called before that.
SCD Type 6: Hybrid (1 + 2 + 3 Combined)
The Rule
Type 6 combines Types 1, 2, AND 3 in one table. Each row has: – The current value (Type 1 — always up to date) – The historical value for that version (Type 2 — full history) – The previous value (Type 3 — one-level comparison)
It is called Type 6 because 1 + 2 + 3 = 6.
After Alice Moves
| customer_key | customer_id | name | historical_city | current_city | previous_city | effective_from | effective_to | is_current |
|---|---|---|---|---|---|---|---|---|
| 1001 | C100 | Alice | Toronto | Mumbai | NULL | 2022-03-15 | 2026-01-14 | 0 |
| 1004 | C100 | Alice | Mumbai | Mumbai | Toronto | 2026-01-15 | 9999-12-31 | 1 |
Notice:
– historical_city = the city for THAT specific version (does not change)
– current_city = Alice’s current city (updated across ALL rows — Type 1)
– previous_city = her previous city (Type 3)
– Multiple rows with dates = full history (Type 2)
When to Use Type 6
- Enterprise data warehouses that need maximum flexibility
- When different reports need different perspectives:
- “Revenue by city at time of order” → use
historical_city - “Revenue by CURRENT city” → use
current_city - “Customers who recently changed cities” → compare
current_cityvsprevious_city
Real-life analogy: Type 6 is like a comprehensive employee record that shows: your current title (Type 1), all past titles with dates (Type 2), and your immediately previous title (Type 3). HR can answer any question about your career history.
Side-by-Side Comparison
| Feature | Type 0 | Type 1 | Type 2 | Type 3 | Type 6 |
|---|---|---|---|---|---|
| History | None (original preserved) | None (overwritten) | Full | One level | Full + current + previous |
| Row count | Fixed | Fixed | Grows | Fixed | Grows |
| Complexity | Simplest | Simple | Medium | Simple | Complex |
| Storage | Least | Least | More | Slightly more | Most |
| Query performance | Fast | Fast | Slower (more rows) | Fast | Slower |
| Point-in-time query | No | No | Yes | Limited | Yes |
| Current state query | N/A | Yes | Yes (WHERE is_current=1) | Yes | Yes |
| Use case | Static attributes | Non-critical updates | Regulatory, analytics | Before/after comparison | Enterprise DW |
Implementing SCD Type 2 in Azure SQL Database
This is the most common SCD type. Here is a complete implementation:
Create the Dimension Table
CREATE TABLE dim_customer (
customer_key INT IDENTITY(1,1) PRIMARY KEY,
customer_id VARCHAR(20) NOT NULL, -- Business key
name VARCHAR(100),
city VARCHAR(50),
segment VARCHAR(50),
effective_from DATE NOT NULL,
effective_to DATE NOT NULL DEFAULT '9999-12-31',
is_current BIT NOT NULL DEFAULT 1,
created_date DATETIME2 DEFAULT GETDATE()
);
CREATE INDEX IX_dim_customer_bk ON dim_customer (customer_id, is_current);
CREATE INDEX IX_dim_customer_dates ON dim_customer (customer_id, effective_from, effective_to);
The MERGE Statement (Upsert)
-- Staging table has the latest source data
-- dim_customer is the target SCD Type 2 table
-- Step 1: Close existing current records where source data changed
UPDATE d
SET d.effective_to = DATEADD(DAY, -1, CAST(GETDATE() AS DATE)),
d.is_current = 0
FROM dim_customer d
INNER JOIN staging_customer s ON d.customer_id = s.customer_id
WHERE d.is_current = 1
AND (d.city != s.city OR d.segment != s.segment);
-- Step 2: Insert new version for changed records
INSERT INTO dim_customer (customer_id, name, city, segment, effective_from, effective_to, is_current)
SELECT s.customer_id, s.name, s.city, s.segment,
CAST(GETDATE() AS DATE), '9999-12-31', 1
FROM staging_customer s
INNER JOIN dim_customer d ON s.customer_id = d.customer_id
WHERE d.is_current = 0
AND d.effective_to = DATEADD(DAY, -1, CAST(GETDATE() AS DATE));
-- Step 3: Insert brand new customers
INSERT INTO dim_customer (customer_id, name, city, segment, effective_from, effective_to, is_current)
SELECT s.customer_id, s.name, s.city, s.segment,
CAST(GETDATE() AS DATE), '9999-12-31', 1
FROM staging_customer s
WHERE NOT EXISTS (
SELECT 1 FROM dim_customer d WHERE d.customer_id = s.customer_id
);
Why Three Steps?
- Close old records — the existing current row for changed customers gets an end date
- Insert new versions — a new row with the updated values and
is_current=1 - Insert new customers — first-time customers get their first row
Real-life analogy: It is like a hotel ledger. When a guest checks out of one room (close old record) and checks into another room (insert new record), both entries stay in the ledger. New guests who have never stayed before get their first entry.
Implementing SCD Type 2 in Synapse Data Flows
Visual Approach
Source: staging_customer
Source: dim_customer (WHERE is_current = 1)
|
JOIN: staging LEFT OUTER JOIN dim ON customer_id
|
Conditional Split:
New Customer: isNull(dim@customer_key) --> Insert new row
Changed: staging@city != dim@city OR staging@segment != dim@segment
--> Two paths:
1. Update old row (set is_current=0, effective_to=today)
2. Insert new row (is_current=1, effective_from=today)
Unchanged: Default --> discard (no action needed)
The Alter Row Transformation
Use Alter Row to mark rows for INSERT or UPDATE:
Insert if: isNull(dim@customer_key) -- New customer
Upsert if: staging@city != dim@city -- Changed customer (close + insert)
Implementing SCD Type 2 with Delta Lake MERGE
In Databricks or Synapse Spark, Delta Lake makes SCD Type 2 elegant:
from delta.tables import DeltaTable
dim_customer = DeltaTable.forPath(spark, "/delta/dim_customer")
# MERGE: handle new, changed, and unchanged in one statement
(dim_customer.alias("target")
.merge(
staging_df.alias("source"),
"target.customer_id = source.customer_id AND target.is_current = true"
)
# When matched AND values changed: close the old record
.whenMatchedUpdate(
condition="source.city != target.city OR source.segment != target.segment",
set={
"is_current": "false",
"effective_to": "current_date()"
}
)
# When not matched: insert new customer
.whenNotMatchedInsert(values={
"customer_id": "source.customer_id",
"name": "source.name",
"city": "source.city",
"segment": "source.segment",
"effective_from": "current_date()",
"effective_to": "lit('9999-12-31')",
"is_current": "true"
})
.execute()
)
# Second pass: insert new version rows for changed records
changed_records = staging_df.join(
dim_customer.toDF().filter("is_current = false AND effective_to = current_date()"),
"customer_id"
)
# Insert new current rows for changed customers
SCD in the Medallion Architecture
Bronze: Raw data from source (staging_customer)
|
v
Silver: Apply SCD Type 2 logic
|-- dim_customer with full history
|-- effective_from, effective_to, is_current
|
v
Gold: Business views
|-- v_current_customers (WHERE is_current = 1)
|-- v_customer_history (all versions, for audit)
Handling Multiple Changes
What if Alice moves to Mumbai in January AND her segment changes to “Enterprise” in March?
Two Separate Changes = Two New Rows
| customer_key | customer_id | city | segment | effective_from | effective_to | is_current |
|---|---|---|---|---|---|---|
| 1001 | C100 | Toronto | Premium | 2022-03-15 | 2026-01-14 | 0 |
| 1004 | C100 | Mumbai | Premium | 2026-01-15 | 2026-03-09 | 0 |
| 1005 | C100 | Mumbai | Enterprise | 2026-03-10 | 9999-12-31 | 1 |
Three versions of Alice. The fact table links to the correct version based on when each order was placed.
Edge Cases and Gotchas
1. Same-Day Changes
Alice moves to Mumbai AND changes segment on the same day. Do you create one row or two?
Best practice: One row with both changes. If they happen on the same effective date, combine them into a single version.
2. Backdated Changes
Source system sends a correction: “Alice actually moved on December 15, 2025, not January 15, 2026.”
Challenge: Orders between Dec 15 and Jan 15 were linked to the wrong dimension version.
Fix: Reprocess the affected date range. This is complex and is why most teams process SCD changes nightly (not retroactively).
3. Deleted Customers
Source system deletes a customer. Do you delete from the dimension?
Best practice: NEVER delete from dimensions. Set is_current = 0 and effective_to = today. This preserves history for past fact records.
4. Reactivated Customers
Alice was marked inactive in 2025, then reactivated in 2026.
Solution: Insert a new row with is_current = 1. The inactive period is captured in the previous version’s date range.
Real-World Scenarios
Scenario 1: Financial Services (Regulatory)
Requirement: Auditors need to know which branch a customer belonged to when each transaction occurred.
SCD Type: Type 2 (mandatory for financial compliance).
Why: If Alice was in the Toronto branch when she made a $10,000 deposit, the audit must show Toronto — even if she later transferred to Mumbai.
Scenario 2: E-Commerce Product Catalog
Requirement: Product prices change weekly. Reports show revenue by product.
SCD Type: Type 2 for price, Type 1 for product name corrections.
Why: Revenue should reflect the price AT THE TIME OF SALE. But if a product name had a typo, fix it everywhere (Type 1).
Scenario 3: HR Employee Tracking
Requirement: Track employee promotions and department transfers.
SCD Type: Type 2 for department and title. Type 0 for hire date.
Why: Headcount by department for Q1 2025 should use the department employees were in DURING Q1, not their current department.
Scenario 4: Marketing Customer Segmentation
Requirement: Marketing wants to see if customers who changed segments had higher spend.
SCD Type: Type 6 (Type 2 + Type 3 combined).
Why: Type 2 gives full history. The previous_segment column (Type 3) makes it easy to query “customers who upgraded from Standard to Premium.”
Interview Questions
Q: What is a Slowly Changing Dimension? A: A dimension table where attribute values change over time — customer address changes, product price changes, employee gets promoted. SCD refers to the strategy for handling these changes in the data warehouse.
Q: What is the difference between SCD Type 1 and Type 2? A: Type 1 overwrites the old value with the new one, losing all history. Type 2 keeps the old row (end-dated) and inserts a new row with the updated value, preserving full history. Type 1 is simpler but cannot answer “what was the state at a past point in time?”
Q: Why do SCD Type 2 tables use surrogate keys? A: Because one business entity (customer_id = C100) can have multiple rows representing different time periods. The surrogate key (customer_key) uniquely identifies each VERSION. Fact tables reference the surrogate key to link to the correct historical version.
Q: How do you query the current state in an SCD Type 2 table?
A: SELECT * FROM dim_customer WHERE is_current = 1. This filters to only the latest version of each record, giving you the same result as a Type 1 table.
Q: How do you implement SCD Type 2 in Azure? A: Three approaches: (1) SQL MERGE/UPDATE+INSERT in Azure SQL Database or Synapse Dedicated SQL Pool, (2) Data Flow with Conditional Split, Alter Row, and two sinks (update old + insert new), (3) Delta Lake MERGE in Databricks or Synapse Spark.
Q: When would you NOT use SCD Type 2? A: When history does not matter (use Type 1 for corrections). When storage and query performance are critical and the dimension has millions of rows (Type 2 grows the table). When the attribute changes very frequently (hourly changes would create enormous table growth).
Wrapping Up
SCD is one of those concepts that separates junior data engineers from senior ones. Everyone can build a pipeline that copies data. But handling dimension changes correctly — knowing when to overwrite, when to version, and when to preserve history — requires understanding the business requirements behind the data.
The most important question to ask stakeholders: “Do you need to know what the value WAS at a past point in time, or do you only care about what it is NOW?” The answer determines your SCD type.
Related posts: – Data Flows in ADF/Synapse – Schema-on-Write vs Schema-on-Read – Top 20 DE Interview Questions – SQL Window Functions – Audit Logging in Data Pipelines
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.