SQL Window Functions Explained: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and More

Window functions are the single most important SQL concept that separates junior developers from senior data engineers. If you have ever written a subquery to get the “latest record per customer” or a self-join to compare a row with the previous row, window functions replace all of that with cleaner, faster, more readable SQL.

Despite their power, window functions are poorly understood. Most tutorials show the syntax but never explain WHY the function works the way it does. This post fixes that.

I will cover every major window function with real examples, visual explanations, and the exact interview questions you will face. By the end, you will understand not just the syntax but the logic behind each function.

What Are Window Functions?
The OVER() Clause: The Foundation of Everything
PARTITION BY vs GROUP BY
ORDER BY Inside OVER()
ROW_NUMBER(): Assign Sequential Numbers
RANK(): Handle Ties with Gaps
DENSE_RANK(): Handle Ties Without Gaps
ROW_NUMBER vs RANK vs DENSE_RANK: Side-by-Side
NTILE(): Divide Rows into Buckets
LAG(): Access the Previous Row
LEAD(): Access the Next Row
SUM() OVER(): Running Totals
AVG() OVER(): Moving Averages
COUNT() OVER(): Running Counts
FIRST_VALUE() and LAST_VALUE()
The ROWS and RANGE Frame Clause
Real-World Use Cases
Performance Considerations
Common Mistakes
Interview Questions
Wrapping Up

What Are Window Functions?

A window function performs a calculation across a set of rows related to the current row without collapsing those rows into a single output. Unlike GROUP BY which reduces 100 rows into 10 groups, window functions keep all 100 rows and add a calculated column.

GROUP BY (aggregation): Reduces rows

SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;

-- Result: 1 row per department
-- department | avg_salary
-- Sales      | 65000
-- Engineering| 85000

Window function: Keeps all rows, adds a column

SELECT name, department, salary,
       AVG(salary) OVER(PARTITION BY department) as dept_avg
FROM employees;

-- Result: Every employee row preserved, with department average added
-- name    | department  | salary | dept_avg
-- Alice   | Sales       | 60000  | 65000
-- Bob     | Sales       | 70000  | 65000
-- Carol   | Engineering | 90000  | 85000
-- Dave    | Engineering | 80000  | 85000

Every row keeps its identity. The window function adds context without losing detail.

The OVER() Clause: The Foundation of Everything

Every window function uses the OVER() clause. This is what makes it a window function instead of a regular aggregate.

function_name() OVER (
    [PARTITION BY column1, column2]
    [ORDER BY column3]
    [ROWS/RANGE frame_clause]
)

The three parts of OVER():

Part	What It Does	Required?
`PARTITION BY`	Divides rows into groups (like GROUP BY but keeps rows)	Optional
`ORDER BY`	Sorts rows within each partition	Required for ranking and LAG/LEAD
`ROWS/RANGE`	Defines the window frame (which rows to include in calculation)	Optional (has defaults)

Empty OVER(): SUM(salary) OVER() calculates the sum across ALL rows.

With PARTITION BY: SUM(salary) OVER(PARTITION BY department) calculates the sum per department.

With ORDER BY: SUM(salary) OVER(ORDER BY hire_date) calculates a running total ordered by hire date.

PARTITION BY vs GROUP BY

This is the most important distinction to understand:

-- GROUP BY: Collapses rows into groups (loses individual rows)
SELECT department, COUNT(*) as emp_count
FROM employees
GROUP BY department;
-- Returns 3 rows (one per department)

-- PARTITION BY: Keeps all rows, adds group calculation
SELECT name, department, salary,
       COUNT(*) OVER(PARTITION BY department) as dept_count
FROM employees;
-- Returns ALL employee rows, each with their department count

When to use GROUP BY: When you want summary data (totals, averages, counts per group).

When to use PARTITION BY: When you want summary data added to each detail row.

ORDER BY Inside OVER()

ORDER BY in the OVER() clause does two things:

Sorts rows within each partition for ranking functions
Defines the running frame for aggregate functions (changes SUM from total to running total)

-- Without ORDER BY: total sum of entire partition
SELECT name, salary,
       SUM(salary) OVER() as total_salary
FROM employees;
-- Every row shows the same total: 300000

-- With ORDER BY: running sum
SELECT name, salary,
       SUM(salary) OVER(ORDER BY hire_date) as running_total
FROM employees;
-- Each row shows cumulative sum up to that point:
-- Alice | 60000 | 60000
-- Bob   | 70000 | 130000
-- Carol | 90000 | 220000
-- Dave  | 80000 | 300000

ROW_NUMBER(): Assign Sequential Numbers

What It Does

Assigns a unique sequential number to each row within a partition, starting from 1. No ties — even if values are identical, each row gets a different number.

Syntax

ROW_NUMBER() OVER(
    [PARTITION BY column]
    ORDER BY column
)

Example: Number Employees by Salary

SELECT name, department, salary,
       ROW_NUMBER() OVER(ORDER BY salary DESC) as row_num
FROM employees;

-- name    | department  | salary | row_num
-- Carol   | Engineering | 90000  | 1
-- Dave    | Engineering | 80000  | 2
-- Bob     | Sales       | 70000  | 3
-- Alice   | Sales       | 60000  | 4

Example: Number Within Each Department

SELECT name, department, salary,
       ROW_NUMBER() OVER(
           PARTITION BY department
           ORDER BY salary DESC
       ) as dept_rank
FROM employees;

-- name    | department  | salary | dept_rank
-- Carol   | Engineering | 90000  | 1
-- Dave    | Engineering | 80000  | 2
-- Bob     | Sales       | 70000  | 1
-- Alice   | Sales       | 60000  | 2

The #1 Use Case: Get the Latest/Top Record per Group

This is the most common interview question involving window functions:

“Get the most recent order for each customer”

WITH ranked AS (
    SELECT customer_id, order_id, order_date, amount,
           ROW_NUMBER() OVER(
               PARTITION BY customer_id
               ORDER BY order_date DESC
           ) as rn
    FROM orders
)
SELECT customer_id, order_id, order_date, amount
FROM ranked
WHERE rn = 1;

This pattern works for: latest record per customer, highest salary per department, most recent login per user, top scorer per team.

RANK(): Handle Ties with Gaps

What It Does

Like ROW_NUMBER but handles ties. When two rows have the same value, they get the same rank. The next rank skips the tied positions.

Example

SELECT name, salary,
       RANK() OVER(ORDER BY salary DESC) as rank
FROM employees;

-- name    | salary | rank
-- Carol   | 90000  | 1
-- Dave    | 80000  | 2
-- Bob     | 70000  | 3
-- Eve     | 70000  | 3    <-- tied with Bob
-- Alice   | 60000  | 5    <-- skips 4 (because two people tied at 3)

Notice: Rank goes 1, 2, 3, 3, 5 — it skips 4 because two people shared rank 3.

DENSE_RANK(): Handle Ties Without Gaps

What It Does

Like RANK but no gaps. Tied rows get the same rank, and the next rank is the next consecutive number.

Example

SELECT name, salary,
       DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;

-- name    | salary | dense_rank
-- Carol   | 90000  | 1
-- Dave    | 80000  | 2
-- Bob     | 70000  | 3
-- Eve     | 70000  | 3    <-- tied with Bob
-- Alice   | 60000  | 4    <-- no gap, continues from 3

Notice: Dense rank goes 1, 2, 3, 3, 4 — no gap after the tie.

ROW_NUMBER vs RANK vs DENSE_RANK: Side-by-Side

SELECT name, salary,
       ROW_NUMBER() OVER(ORDER BY salary DESC) as row_num,
       RANK()       OVER(ORDER BY salary DESC) as rank,
       DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;

-- name    | salary | row_num | rank | dense_rank
-- Carol   | 90000  | 1       | 1    | 1
-- Dave    | 80000  | 2       | 2    | 2
-- Bob     | 70000  | 3       | 3    | 3
-- Eve     | 70000  | 4       | 3    | 3
-- Alice   | 60000  | 5       | 5    | 4

Function	Ties?	Gaps?	Use When
ROW_NUMBER	No ties (arbitrary)	No gaps	Need unique row numbers, deduplication
RANK	Yes (same rank)	Yes (skips)	Competition ranking (1st, 2nd, tied 3rd, 5th)
DENSE_RANK	Yes (same rank)	No (consecutive)	Need consecutive ranks with ties

NTILE(): Divide Rows into Buckets

What It Does

Divides rows into a specified number of approximately equal groups (buckets) and assigns a bucket number to each row.

Example: Divide Employees into 4 Salary Quartiles

SELECT name, salary,
       NTILE(4) OVER(ORDER BY salary) as quartile
FROM employees;

-- name    | salary | quartile
-- Alice   | 50000  | 1    (bottom 25%)
-- Bob     | 60000  | 1
-- Carol   | 70000  | 2
-- Dave    | 80000  | 2
-- Eve     | 85000  | 3
-- Frank   | 90000  | 3
-- Grace   | 95000  | 4    (top 25%)
-- Henry   | 100000 | 4

Use Cases

Quartile analysis: Divide customers into quartiles by revenue
Percentile buckets: Split students into top 10%, next 20%, etc.
Load balancing: Distribute records evenly across processing batches

LAG(): Access the Previous Row

What It Does

Returns the value of a column from a previous row in the partition, based on the ORDER BY. Lets you compare the current row with the row before it.

Syntax

LAG(column, offset, default) OVER(
    [PARTITION BY column]
    ORDER BY column
)

column — which column to retrieve from the previous row
offset — how many rows back (default: 1)
default — value to return if there is no previous row (default: NULL)

Example: Compare Each Month Sales with Previous Month

SELECT month, revenue,
       LAG(revenue, 1, 0) OVER(ORDER BY month) as prev_month_revenue,
       revenue - LAG(revenue, 1, 0) OVER(ORDER BY month) as month_over_month_change
FROM monthly_sales;

-- month   | revenue | prev_month_revenue | month_over_month_change
-- 2026-01 | 50000   | 0                  | 50000    (no previous, default 0)
-- 2026-02 | 55000   | 50000              | 5000     (+5000 growth)
-- 2026-03 | 48000   | 55000              | -7000    (-7000 decline)
-- 2026-04 | 62000   | 48000              | 14000    (+14000 growth)

Example: Calculate Days Between Orders per Customer

SELECT customer_id, order_date,
       LAG(order_date) OVER(
           PARTITION BY customer_id
           ORDER BY order_date
       ) as previous_order_date,
       DATEDIFF(day,
           LAG(order_date) OVER(PARTITION BY customer_id ORDER BY order_date),
           order_date
       ) as days_since_last_order
FROM orders;

LEAD(): Access the Next Row

What It Does

The opposite of LAG. Returns the value from the next row in the partition.

Syntax

LEAD(column, offset, default) OVER(
    [PARTITION BY column]
    ORDER BY column
)

Example: Show Next Month Revenue

SELECT month, revenue,
       LEAD(revenue) OVER(ORDER BY month) as next_month_revenue
FROM monthly_sales;

-- month   | revenue | next_month_revenue
-- 2026-01 | 50000   | 55000
-- 2026-02 | 55000   | 48000
-- 2026-03 | 48000   | 62000
-- 2026-04 | 62000   | NULL    (no next row)

LAG vs LEAD

LAG  = look BACKWARD (previous row)
LEAD = look FORWARD (next row)

Both are essential for time-series analysis, calculating differences between consecutive rows, and detecting trends.

SUM() OVER(): Running Totals

Without ORDER BY: Partition Total

SELECT name, department, salary,
       SUM(salary) OVER(PARTITION BY department) as dept_total
FROM employees;

-- Every employee in the same department sees the same total

With ORDER BY: Running Total

SELECT name, salary, hire_date,
       SUM(salary) OVER(ORDER BY hire_date) as running_total
FROM employees;

-- name    | salary | hire_date  | running_total
-- Alice   | 60000  | 2024-01-15 | 60000
-- Bob     | 70000  | 2024-03-20 | 130000
-- Carol   | 90000  | 2024-06-10 | 220000
-- Dave    | 80000  | 2025-01-05 | 300000

Running Total per Department

SELECT name, department, salary, hire_date,
       SUM(salary) OVER(
           PARTITION BY department
           ORDER BY hire_date
       ) as dept_running_total
FROM employees;

AVG() OVER(): Moving Averages

3-Month Moving Average

SELECT month, revenue,
       AVG(revenue) OVER(
           ORDER BY month
           ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ) as moving_avg_3month
FROM monthly_sales;

-- month   | revenue | moving_avg_3month
-- 2026-01 | 50000   | 50000       (only 1 month available)
-- 2026-02 | 55000   | 52500       (avg of Jan, Feb)
-- 2026-03 | 48000   | 51000       (avg of Jan, Feb, Mar)
-- 2026-04 | 62000   | 55000       (avg of Feb, Mar, Apr)

The ROWS BETWEEN 2 PRECEDING AND CURRENT ROW means: include the current row and the 2 rows before it.

COUNT() OVER(): Running Counts

SELECT name, department,
       COUNT(*) OVER(PARTITION BY department) as dept_size,
       COUNT(*) OVER(ORDER BY hire_date) as cumulative_hires
FROM employees;

FIRST_VALUE() and LAST_VALUE()

FIRST_VALUE(): Get the First Row Value in the Window

SELECT name, department, salary,
       FIRST_VALUE(name) OVER(
           PARTITION BY department
           ORDER BY salary DESC
       ) as highest_paid_in_dept
FROM employees;

-- Every employee sees who earns the most in their department

LAST_VALUE(): Get the Last Row Value

SELECT name, department, salary,
       LAST_VALUE(name) OVER(
           PARTITION BY department
           ORDER BY salary DESC
           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) as lowest_paid_in_dept
FROM employees;

Important: LAST_VALUE requires the frame clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Without it, the default frame only goes up to the current row, so LAST_VALUE just returns the current row (not useful).

The ROWS and RANGE Frame Clause

The frame clause defines which rows are included in the window calculation:

ROWS BETWEEN start AND end

Frame	Meaning
`UNBOUNDED PRECEDING`	First row of the partition
`N PRECEDING`	N rows before current
`CURRENT ROW`	The current row
`N FOLLOWING`	N rows after current
`UNBOUNDED FOLLOWING`	Last row of the partition

Common Frame Patterns

-- Running total (default when ORDER BY is present)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- Entire partition
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

-- 3-row moving average
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

-- Last 3 rows including current
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

Default Behavior

No ORDER BY: Frame is the entire partition
With ORDER BY: Frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (running calculation)

This is why SUM() OVER(ORDER BY date) gives a running total — the default frame includes all rows from the start up to the current row.

Real-World Use Cases

1. Deduplication (Keep Latest Record)

WITH ranked AS (
    SELECT *, ROW_NUMBER() OVER(
        PARTITION BY email
        ORDER BY updated_at DESC
    ) as rn
    FROM users
)
DELETE FROM ranked WHERE rn > 1;

2. Year-over-Year Growth

SELECT year, revenue,
       LAG(revenue) OVER(ORDER BY year) as prev_year,
       ROUND(
           (revenue - LAG(revenue) OVER(ORDER BY year)) * 100.0
           / LAG(revenue) OVER(ORDER BY year), 2
       ) as yoy_growth_pct
FROM annual_revenue;

3. Percentile Ranking

SELECT name, salary,
       PERCENT_RANK() OVER(ORDER BY salary) as percentile
FROM employees;
-- Shows each employee's salary percentile (0 to 1)

4. Gap Detection (Find Missing Dates)

SELECT order_date,
       LEAD(order_date) OVER(ORDER BY order_date) as next_order_date,
       DATEDIFF(day, order_date,
           LEAD(order_date) OVER(ORDER BY order_date)
       ) as gap_days
FROM orders
HAVING gap_days > 1;

5. Running Total for Financial Reporting

SELECT transaction_date, amount,
       SUM(amount) OVER(ORDER BY transaction_date) as balance
FROM transactions
WHERE account_id = 12345;

6. Top N per Group

-- Top 3 products per category by revenue
WITH ranked AS (
    SELECT category, product_name, revenue,
           ROW_NUMBER() OVER(
               PARTITION BY category
               ORDER BY revenue DESC
           ) as rn
    FROM products
)
SELECT * FROM ranked WHERE rn <= 3;

Performance Considerations

Window functions execute AFTER WHERE, GROUP BY, and HAVING. They operate on the result set, not the raw table. Add filters early to reduce the data size.
PARTITION BY benefits from indexes. If you partition by department, an index on department helps.
ORDER BY in OVER() can be expensive. Each partition needs to be sorted. On large datasets, this matters.
Multiple window functions with the same OVER() clause share the sort. SQL Server and PostgreSQL optimize this — define identical OVER() clauses to take advantage of it.

-- GOOD: Same OVER() clause reused (one sort operation)
SELECT name, salary,
       ROW_NUMBER() OVER(ORDER BY salary DESC) as rn,
       RANK()       OVER(ORDER BY salary DESC) as rnk,
       DENSE_RANK() OVER(ORDER BY salary DESC) as drnk
FROM employees;

Use CTEs with window functions instead of subqueries. CTEs are more readable and the optimizer often handles them the same way.

Common Mistakes

1. Using Window Functions in WHERE

-- WRONG: Cannot use window function in WHERE
SELECT * FROM employees
WHERE ROW_NUMBER() OVER(ORDER BY salary DESC) <= 5;

-- RIGHT: Use a CTE or subquery
WITH ranked AS (
    SELECT *, ROW_NUMBER() OVER(ORDER BY salary DESC) as rn
    FROM employees
)
SELECT * FROM ranked WHERE rn <= 5;

2. Forgetting ORDER BY for Ranking Functions

-- WRONG: ROW_NUMBER without ORDER BY (results are unpredictable)
ROW_NUMBER() OVER(PARTITION BY department)

-- RIGHT: Always specify ORDER BY
ROW_NUMBER() OVER(PARTITION BY department ORDER BY salary DESC)

3. LAST_VALUE Without Frame Clause

-- WRONG: Returns current row value (not the actual last value)
LAST_VALUE(name) OVER(PARTITION BY dept ORDER BY salary)

-- RIGHT: Include full frame
LAST_VALUE(name) OVER(
    PARTITION BY dept ORDER BY salary
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

4. Confusing PARTITION BY with GROUP BY

PARTITION BY keeps all rows. GROUP BY collapses rows. They are not interchangeable.

Interview Questions

Q: What is the difference between ROW_NUMBER, RANK, and DENSE_RANK? A: ROW_NUMBER assigns unique sequential numbers (no ties). RANK assigns the same number for ties but skips subsequent numbers (1,2,3,3,5). DENSE_RANK assigns the same number for ties without skipping (1,2,3,3,4).

Q: How do you get the latest record per customer? A: Use ROW_NUMBER with PARTITION BY customer_id ORDER BY date DESC, wrap in a CTE, filter WHERE rn = 1.

Q: What is the difference between LAG and LEAD? A: LAG accesses the previous row value (looks backward). LEAD accesses the next row value (looks forward). Both require ORDER BY in the OVER clause.

Q: How do you calculate a running total in SQL? A: SUM(amount) OVER(ORDER BY date). The ORDER BY creates a default frame from the first row to the current row, producing a cumulative sum.

Q: Can you use a window function in a WHERE clause? A: No. Window functions execute after WHERE. To filter by a window function result, wrap the query in a CTE or subquery and filter in the outer query.

Q: What is PARTITION BY and how is it different from GROUP BY? A: PARTITION BY divides rows into groups for window function calculations but keeps all individual rows. GROUP BY collapses rows into groups and returns one row per group.

Q: How do you calculate month-over-month change? A: Use LAG to get the previous month value, then subtract: revenue – LAG(revenue) OVER(ORDER BY month).

Q: What is a moving average and how do you calculate it in SQL? A: A moving average smooths data by averaging a fixed number of recent values. Use AVG with a frame clause: AVG(value) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) for a 3-period moving average.

Wrapping Up

Window functions are the most powerful feature in modern SQL. They let you:

Rank rows without losing detail (ROW_NUMBER, RANK, DENSE_RANK)
Compare with previous/next rows without self-joins (LAG, LEAD)
Aggregate without collapsing rows (SUM, AVG, COUNT with OVER)
Analyze trends with running totals and moving averages

Master these functions and you can solve 90% of SQL interview questions and write cleaner, faster queries in production.

If this guide helped you understand window functions, share it with someone preparing for SQL interviews. Questions? Drop a comment below.

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

SQL Window Functions Explained: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and More

Table of Contents

What Are Window Functions?

The OVER() Clause: The Foundation of Everything

PARTITION BY vs GROUP BY

ORDER BY Inside OVER()

ROW_NUMBER(): Assign Sequential Numbers

What It Does

Syntax

Example: Number Employees by Salary

Example: Number Within Each Department

The #1 Use Case: Get the Latest/Top Record per Group

RANK(): Handle Ties with Gaps

What It Does

Example

DENSE_RANK(): Handle Ties Without Gaps

What It Does

Example

ROW_NUMBER vs RANK vs DENSE_RANK: Side-by-Side

NTILE(): Divide Rows into Buckets

What It Does

Example: Divide Employees into 4 Salary Quartiles

Use Cases

LAG(): Access the Previous Row

What It Does

Syntax

Example: Compare Each Month Sales with Previous Month

Example: Calculate Days Between Orders per Customer

LEAD(): Access the Next Row

What It Does

Syntax

Example: Show Next Month Revenue

LAG vs LEAD

SUM() OVER(): Running Totals

Without ORDER BY: Partition Total

With ORDER BY: Running Total

Running Total per Department

AVG() OVER(): Moving Averages

3-Month Moving Average

COUNT() OVER(): Running Counts

FIRST_VALUE() and LAST_VALUE()

FIRST_VALUE(): Get the First Row Value in the Window

LAST_VALUE(): Get the Last Row Value

The ROWS and RANGE Frame Clause

Common Frame Patterns

Default Behavior

Real-World Use Cases

1. Deduplication (Keep Latest Record)

2. Year-over-Year Growth

3. Percentile Ranking

4. Gap Detection (Find Missing Dates)

5. Running Total for Financial Reporting

6. Top N per Group

Performance Considerations

Common Mistakes

1. Using Window Functions in WHERE

2. Forgetting ORDER BY for Ranking Functions

3. LAST_VALUE Without Frame Clause

4. Confusing PARTITION BY with GROUP BY

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply