Slowly Changing Dimensions (SCD) in Azure: Types 0, 1, 2, 3, and 6 with Detailed Examples

Slowly Changing Dimensions (SCD) in Azure: Types 0, 1, 2, 3, and 6 with Detailed Examples

Imagine you have a customer named Alice who lives in Toronto. She has been your customer since 2022. In January 2026, she moves to Mumbai. Simple question: what city does Alice live in?

If you say “Mumbai,” you are correct TODAY. But what about the orders she placed in 2023 while living in Toronto? Should those orders show Toronto or Mumbai?

THIS is the Slowly Changing Dimension (SCD) problem. And depending on how you handle it, you get completely different answers to business questions. Get it wrong, and your sales reports, customer analytics, and compliance data are all incorrect.

This post covers every SCD type with detailed table examples, SQL implementation in Azure, and the real-world scenarios where each type is the right choice.

Table of Contents

  • What Is a Slowly Changing Dimension?
  • The Star Schema Context
  • Our Running Example: Alice Moves to Mumbai
  • SCD Type 0: Fixed (Never Changes)
  • SCD Type 1: Overwrite (Lose History)
  • SCD Type 2: Add New Row (Full History)
  • SCD Type 3: Add New Column (Limited History)
  • SCD Type 6: Hybrid (1 + 2 + 3 Combined)
  • Side-by-Side Comparison
  • Implementing SCD Type 2 in Azure SQL Database
  • Implementing SCD Type 2 in Synapse Data Flows
  • Implementing SCD Type 2 with Delta Lake MERGE
  • SCD in the Medallion Architecture
  • Handling Multiple Changes
  • Edge Cases and Gotchas
  • Real-World Scenarios
  • Interview Questions
  • Wrapping Up

What Is a Slowly Changing Dimension?

A dimension in a data warehouse is a table that describes the WHO, WHAT, WHERE of your business: customers, products, stores, employees. These dimensions change slowly over time — a customer moves cities, a product changes categories, an employee gets promoted.

The question is: how do you handle these changes in your data warehouse?

The answer depends on your SCD type.

Real-life analogy: Think of your phone contacts. When a friend changes their phone number, do you: – Overwrite the old number (SCD Type 1) — you can call them now but lost their old number – Add a new contact entry with the new number and mark the old one as “previous” (SCD Type 2) — full history – Add a “previous phone” field to the same contact (SCD Type 3) — one level of history

Each approach serves different needs.

The Star Schema Context

SCDs live in dimensional models (star schemas):

dim_customer (SCD applies here)
               |
dim_date --- fact_orders --- dim_product
               |
          dim_store

Fact table (fact_orders): records events — order_id, amount, date_key, customer_key. Millions of rows. Append-only.

Dimension table (dim_customer): describes entities — customer_key, name, city, segment. Thousands of rows. Changes over time.

The customer_key in the fact table points to a row in the dimension table. SCD determines what happens to that row when the customer’s data changes.

Our Running Example: Alice Moves to Mumbai

Starting state (dimension table):

customer_key customer_id name city segment
1001 C100 Alice Toronto Premium
1002 C200 Bob London Standard
1003 C300 Carol Mumbai Premium

The change: On January 15, 2026, Alice moves from Toronto to Mumbai.

Let us see how each SCD type handles this.

SCD Type 0: Fixed (Never Changes)

The Rule

Once a dimension record is created, it NEVER changes. The original value is preserved forever. Any updates from the source are ignored.

After Alice Moves

customer_key customer_id name city segment
1001 C100 Alice Toronto Premium
1002 C200 Bob London Standard
1003 C300 Carol Mumbai Premium

City stays Toronto. The change is ignored completely.

When to Use Type 0

  • Date of birth — never changes (even if the source system “corrects” it)
  • Original signup date — the date the customer first registered
  • Social Security Number — once assigned, never modified
  • Regulatory records — where the original value must be preserved for legal reasons

Real-life analogy: Your birth certificate. Even if your name changes legally, the birth certificate still shows your original birth name. It is a permanent historical record.

SCD Type 1: Overwrite (Lose History)

The Rule

The old value is overwritten with the new value. No history is kept. The dimension always reflects the CURRENT state.

After Alice Moves

customer_key customer_id name city segment
1001 C100 Alice Mumbai Premium
1002 C200 Bob London Standard
1003 C300 Carol Mumbai Premium

Toronto is gone. Replaced with Mumbai. If you query Alice’s orders from 2023, they now show Mumbai (even though she was in Toronto when she placed them).

SQL Implementation

UPDATE dim_customer
SET city = 'Mumbai',
    last_updated = GETDATE()
WHERE customer_id = 'C100';

When to Use Type 1

  • Corrections — fixing a typo (city was “Tronto” and corrected to “Toronto”)
  • Non-critical attributes — phone number, email, where history does not matter
  • Real-time dashboards — you want the current state only
  • Simple reporting — “how many customers are in each city RIGHT NOW?”

The Problem with Type 1

Fact: Alice ordered $500 on 2023-06-15 (she was in Toronto)
Dim:  Alice's city is now Mumbai

Report: "Revenue by City, 2023"
  Mumbai: $500 (WRONG -- she was in Toronto when she ordered)
  Toronto: $0 (WRONG -- this is where the order actually happened)

History is rewritten. Past reports change retroactively. This is why Type 1 is only appropriate when history does not matter.

Real-life analogy: Type 1 is like updating your GPS address. You change your home address and the GPS forgets you ever lived anywhere else. Good for navigation, bad for proving where you lived in 2023.

SCD Type 2: Add New Row (Full History)

The Rule

When a change occurs, the old record is “closed” (end-dated) and a new record is inserted with the new values. Both versions coexist in the table. Full history is preserved.

After Alice Moves

customer_key customer_id name city effective_from effective_to is_current
1001 C100 Alice Toronto 2022-03-15 2026-01-14 0
1004 C100 Alice Mumbai 2026-01-15 9999-12-31 1
1002 C200 Bob London 2023-01-01 9999-12-31 1
1003 C300 Carol Mumbai 2024-06-01 9999-12-31 1

Key observations: – Alice now has TWO rows (customer_key 1001 and 1004) – Same customer_id (C100) but different customer_key (surrogate key) – Row 1001 has effective_to = 2026-01-14 and is_current = 0 (historical) – Row 1004 has effective_to = 9999-12-31 and is_current = 1 (current) – 9999-12-31 is a convention meaning “still active”

Why Surrogate Keys Matter

Notice customer_key is different from customer_id:

  • customer_id = business key (C100). Stays the same across versions. Identifies the PERSON.
  • customer_key = surrogate key (1001, 1004). Auto-generated. Identifies the VERSION.

Fact table references customer_key, not customer_id. This is how you link an order to the CORRECT version of the customer:

fact_orders:
  order_id=501, customer_key=1001, amount=500, date=2023-06-15
  (Links to Alice in Toronto -- CORRECT for 2023)

  order_id=720, customer_key=1004, amount=300, date=2026-02-10
  (Links to Alice in Mumbai -- CORRECT for 2026)

Querying SCD Type 2

Current state (like Type 1):

SELECT * FROM dim_customer
WHERE is_current = 1;

Historical state (point-in-time):

-- Where was Alice on 2023-06-15?
SELECT * FROM dim_customer
WHERE customer_id = 'C100'
  AND '2023-06-15' >= effective_from
  AND '2023-06-15' < effective_to;

-- Result: Toronto (customer_key=1001)

All versions of a customer:

SELECT * FROM dim_customer
WHERE customer_id = 'C100'
ORDER BY effective_from;

-- Result: Toronto (2022-2026), Mumbai (2026-present)

When to Use Type 2

  • Regulatory compliance — must track where customers lived when they transacted (GDPR, financial audits)
  • Accurate historical reporting — revenue by city should reflect the city AT THE TIME of the order
  • Customer journey analysis — track how customer segments change over time
  • Any scenario where “what was the state at that point in time?” matters

Real-life analogy: Type 2 is like your passport history. When you renew your passport, the old one is cancelled (end-dated) and a new one is issued. Immigration can see where you traveled with EACH passport. Complete history preserved.

SCD Type 3: Add New Column (Limited History)

The Rule

Add a column for the previous value. You keep ONE level of history but not the full timeline.

After Alice Moves

customer_key customer_id name current_city previous_city city_changed_date
1001 C100 Alice Mumbai Toronto 2026-01-15
1002 C200 Bob London NULL NULL
1003 C300 Carol Mumbai NULL NULL

SQL Implementation

UPDATE dim_customer
SET previous_city = current_city,     -- Save old value
    current_city = 'Mumbai',          -- Update to new value
    city_changed_date = '2026-01-15'
WHERE customer_id = 'C100';

The Limitation

What if Alice moves AGAIN to Paris in 2027?

customer_key customer_id current_city previous_city
1001 C100 Paris Mumbai

Toronto is gone. Type 3 only remembers one previous value. You lose the Toronto history.

When to Use Type 3

  • You need to compare current vs previous values (e.g., “did the customer’s segment change?”)
  • You only care about one level of change (current and immediately prior)
  • The attribute rarely changes (maybe once in the customer’s lifetime)

Real-life analogy: Type 3 is like your phone’s “Recent Calls” showing only the last missed call. You know who called most recently, but not who called before that.

SCD Type 6: Hybrid (1 + 2 + 3 Combined)

The Rule

Type 6 combines Types 1, 2, AND 3 in one table. Each row has: – The current value (Type 1 — always up to date) – The historical value for that version (Type 2 — full history) – The previous value (Type 3 — one-level comparison)

It is called Type 6 because 1 + 2 + 3 = 6.

After Alice Moves

customer_key customer_id name historical_city current_city previous_city effective_from effective_to is_current
1001 C100 Alice Toronto Mumbai NULL 2022-03-15 2026-01-14 0
1004 C100 Alice Mumbai Mumbai Toronto 2026-01-15 9999-12-31 1

Notice:historical_city = the city for THAT specific version (does not change) – current_city = Alice’s current city (updated across ALL rows — Type 1) – previous_city = her previous city (Type 3) – Multiple rows with dates = full history (Type 2)

When to Use Type 6

  • Enterprise data warehouses that need maximum flexibility
  • When different reports need different perspectives:
  • “Revenue by city at time of order” → use historical_city
  • “Revenue by CURRENT city” → use current_city
  • “Customers who recently changed cities” → compare current_city vs previous_city

Real-life analogy: Type 6 is like a comprehensive employee record that shows: your current title (Type 1), all past titles with dates (Type 2), and your immediately previous title (Type 3). HR can answer any question about your career history.

Side-by-Side Comparison

Feature Type 0 Type 1 Type 2 Type 3 Type 6
History None (original preserved) None (overwritten) Full One level Full + current + previous
Row count Fixed Fixed Grows Fixed Grows
Complexity Simplest Simple Medium Simple Complex
Storage Least Least More Slightly more Most
Query performance Fast Fast Slower (more rows) Fast Slower
Point-in-time query No No Yes Limited Yes
Current state query N/A Yes Yes (WHERE is_current=1) Yes Yes
Use case Static attributes Non-critical updates Regulatory, analytics Before/after comparison Enterprise DW

Implementing SCD Type 2 in Azure SQL Database

This is the most common SCD type. Here is a complete implementation:

Create the Dimension Table

CREATE TABLE dim_customer (
    customer_key    INT IDENTITY(1,1) PRIMARY KEY,
    customer_id     VARCHAR(20) NOT NULL,      -- Business key
    name            VARCHAR(100),
    city            VARCHAR(50),
    segment         VARCHAR(50),
    effective_from  DATE NOT NULL,
    effective_to    DATE NOT NULL DEFAULT '9999-12-31',
    is_current      BIT NOT NULL DEFAULT 1,
    created_date    DATETIME2 DEFAULT GETDATE()
);

CREATE INDEX IX_dim_customer_bk ON dim_customer (customer_id, is_current);
CREATE INDEX IX_dim_customer_dates ON dim_customer (customer_id, effective_from, effective_to);

The MERGE Statement (Upsert)

-- Staging table has the latest source data
-- dim_customer is the target SCD Type 2 table

-- Step 1: Close existing current records where source data changed
UPDATE d
SET d.effective_to = DATEADD(DAY, -1, CAST(GETDATE() AS DATE)),
    d.is_current = 0
FROM dim_customer d
INNER JOIN staging_customer s ON d.customer_id = s.customer_id
WHERE d.is_current = 1
  AND (d.city != s.city OR d.segment != s.segment);

-- Step 2: Insert new version for changed records
INSERT INTO dim_customer (customer_id, name, city, segment, effective_from, effective_to, is_current)
SELECT s.customer_id, s.name, s.city, s.segment,
       CAST(GETDATE() AS DATE), '9999-12-31', 1
FROM staging_customer s
INNER JOIN dim_customer d ON s.customer_id = d.customer_id
WHERE d.is_current = 0
  AND d.effective_to = DATEADD(DAY, -1, CAST(GETDATE() AS DATE));

-- Step 3: Insert brand new customers
INSERT INTO dim_customer (customer_id, name, city, segment, effective_from, effective_to, is_current)
SELECT s.customer_id, s.name, s.city, s.segment,
       CAST(GETDATE() AS DATE), '9999-12-31', 1
FROM staging_customer s
WHERE NOT EXISTS (
    SELECT 1 FROM dim_customer d WHERE d.customer_id = s.customer_id
);

Why Three Steps?

  1. Close old records — the existing current row for changed customers gets an end date
  2. Insert new versions — a new row with the updated values and is_current=1
  3. Insert new customers — first-time customers get their first row

Real-life analogy: It is like a hotel ledger. When a guest checks out of one room (close old record) and checks into another room (insert new record), both entries stay in the ledger. New guests who have never stayed before get their first entry.

Implementing SCD Type 2 in Synapse Data Flows

Visual Approach

Source: staging_customer
Source: dim_customer (WHERE is_current = 1)
  |
  JOIN: staging LEFT OUTER JOIN dim ON customer_id
  |
  Conditional Split:
    New Customer:    isNull(dim@customer_key) --> Insert new row
    Changed:         staging@city != dim@city OR staging@segment != dim@segment
                     --> Two paths:
                        1. Update old row (set is_current=0, effective_to=today)
                        2. Insert new row (is_current=1, effective_from=today)
    Unchanged:       Default --> discard (no action needed)

The Alter Row Transformation

Use Alter Row to mark rows for INSERT or UPDATE:

Insert if:  isNull(dim@customer_key)                    -- New customer
Upsert if:  staging@city != dim@city                    -- Changed customer (close + insert)

Implementing SCD Type 2 with Delta Lake MERGE

In Databricks or Synapse Spark, Delta Lake makes SCD Type 2 elegant:

from delta.tables import DeltaTable

dim_customer = DeltaTable.forPath(spark, "/delta/dim_customer")

# MERGE: handle new, changed, and unchanged in one statement
(dim_customer.alias("target")
    .merge(
        staging_df.alias("source"),
        "target.customer_id = source.customer_id AND target.is_current = true"
    )
    # When matched AND values changed: close the old record
    .whenMatchedUpdate(
        condition="source.city != target.city OR source.segment != target.segment",
        set={
            "is_current": "false",
            "effective_to": "current_date()"
        }
    )
    # When not matched: insert new customer
    .whenNotMatchedInsert(values={
        "customer_id": "source.customer_id",
        "name": "source.name",
        "city": "source.city",
        "segment": "source.segment",
        "effective_from": "current_date()",
        "effective_to": "lit('9999-12-31')",
        "is_current": "true"
    })
    .execute()
)

# Second pass: insert new version rows for changed records
changed_records = staging_df.join(
    dim_customer.toDF().filter("is_current = false AND effective_to = current_date()"),
    "customer_id"
)
# Insert new current rows for changed customers

SCD in the Medallion Architecture

Bronze: Raw data from source (staging_customer)
  |
  v
Silver: Apply SCD Type 2 logic
  |-- dim_customer with full history
  |-- effective_from, effective_to, is_current
  |
  v
Gold: Business views
  |-- v_current_customers (WHERE is_current = 1)
  |-- v_customer_history (all versions, for audit)

Handling Multiple Changes

What if Alice moves to Mumbai in January AND her segment changes to “Enterprise” in March?

Two Separate Changes = Two New Rows

customer_key customer_id city segment effective_from effective_to is_current
1001 C100 Toronto Premium 2022-03-15 2026-01-14 0
1004 C100 Mumbai Premium 2026-01-15 2026-03-09 0
1005 C100 Mumbai Enterprise 2026-03-10 9999-12-31 1

Three versions of Alice. The fact table links to the correct version based on when each order was placed.

Edge Cases and Gotchas

1. Same-Day Changes

Alice moves to Mumbai AND changes segment on the same day. Do you create one row or two?

Best practice: One row with both changes. If they happen on the same effective date, combine them into a single version.

2. Backdated Changes

Source system sends a correction: “Alice actually moved on December 15, 2025, not January 15, 2026.”

Challenge: Orders between Dec 15 and Jan 15 were linked to the wrong dimension version.

Fix: Reprocess the affected date range. This is complex and is why most teams process SCD changes nightly (not retroactively).

3. Deleted Customers

Source system deletes a customer. Do you delete from the dimension?

Best practice: NEVER delete from dimensions. Set is_current = 0 and effective_to = today. This preserves history for past fact records.

4. Reactivated Customers

Alice was marked inactive in 2025, then reactivated in 2026.

Solution: Insert a new row with is_current = 1. The inactive period is captured in the previous version’s date range.

Real-World Scenarios

Scenario 1: Financial Services (Regulatory)

Requirement: Auditors need to know which branch a customer belonged to when each transaction occurred.

SCD Type: Type 2 (mandatory for financial compliance).

Why: If Alice was in the Toronto branch when she made a $10,000 deposit, the audit must show Toronto — even if she later transferred to Mumbai.

Scenario 2: E-Commerce Product Catalog

Requirement: Product prices change weekly. Reports show revenue by product.

SCD Type: Type 2 for price, Type 1 for product name corrections.

Why: Revenue should reflect the price AT THE TIME OF SALE. But if a product name had a typo, fix it everywhere (Type 1).

Scenario 3: HR Employee Tracking

Requirement: Track employee promotions and department transfers.

SCD Type: Type 2 for department and title. Type 0 for hire date.

Why: Headcount by department for Q1 2025 should use the department employees were in DURING Q1, not their current department.

Scenario 4: Marketing Customer Segmentation

Requirement: Marketing wants to see if customers who changed segments had higher spend.

SCD Type: Type 6 (Type 2 + Type 3 combined).

Why: Type 2 gives full history. The previous_segment column (Type 3) makes it easy to query “customers who upgraded from Standard to Premium.”

Interview Questions

Q: What is a Slowly Changing Dimension? A: A dimension table where attribute values change over time — customer address changes, product price changes, employee gets promoted. SCD refers to the strategy for handling these changes in the data warehouse.

Q: What is the difference between SCD Type 1 and Type 2? A: Type 1 overwrites the old value with the new one, losing all history. Type 2 keeps the old row (end-dated) and inserts a new row with the updated value, preserving full history. Type 1 is simpler but cannot answer “what was the state at a past point in time?”

Q: Why do SCD Type 2 tables use surrogate keys? A: Because one business entity (customer_id = C100) can have multiple rows representing different time periods. The surrogate key (customer_key) uniquely identifies each VERSION. Fact tables reference the surrogate key to link to the correct historical version.

Q: How do you query the current state in an SCD Type 2 table? A: SELECT * FROM dim_customer WHERE is_current = 1. This filters to only the latest version of each record, giving you the same result as a Type 1 table.

Q: How do you implement SCD Type 2 in Azure? A: Three approaches: (1) SQL MERGE/UPDATE+INSERT in Azure SQL Database or Synapse Dedicated SQL Pool, (2) Data Flow with Conditional Split, Alter Row, and two sinks (update old + insert new), (3) Delta Lake MERGE in Databricks or Synapse Spark.

Q: When would you NOT use SCD Type 2? A: When history does not matter (use Type 1 for corrections). When storage and query performance are critical and the dimension has millions of rows (Type 2 grows the table). When the attribute changes very frequently (hourly changes would create enormous table growth).

Wrapping Up

SCD is one of those concepts that separates junior data engineers from senior ones. Everyone can build a pipeline that copies data. But handling dimension changes correctly — knowing when to overwrite, when to version, and when to preserve history — requires understanding the business requirements behind the data.

The most important question to ask stakeholders: “Do you need to know what the value WAS at a past point in time, or do you only care about what it is NOW?” The answer determines your SCD type.

Related posts:Data Flows in ADF/SynapseSchema-on-Write vs Schema-on-ReadTop 20 DE Interview QuestionsSQL Window FunctionsAudit Logging in Data Pipelines


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link