SQL Window Functions Explained: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and More
Window functions are the single most important SQL concept that separates junior developers from senior data engineers. If you have ever written a subquery to get the “latest record per customer” or a self-join to compare a row with the previous row, window functions replace all of that with cleaner, faster, more readable SQL.
Despite their power, window functions are poorly understood. Most tutorials show the syntax but never explain WHY the function works the way it does. This post fixes that.
I will cover every major window function with real examples, visual explanations, and the exact interview questions you will face. By the end, you will understand not just the syntax but the logic behind each function.
Table of Contents
- What Are Window Functions?
- The OVER() Clause: The Foundation of Everything
- PARTITION BY vs GROUP BY
- ORDER BY Inside OVER()
- ROW_NUMBER(): Assign Sequential Numbers
- RANK(): Handle Ties with Gaps
- DENSE_RANK(): Handle Ties Without Gaps
- ROW_NUMBER vs RANK vs DENSE_RANK: Side-by-Side
- NTILE(): Divide Rows into Buckets
- LAG(): Access the Previous Row
- LEAD(): Access the Next Row
- SUM() OVER(): Running Totals
- AVG() OVER(): Moving Averages
- COUNT() OVER(): Running Counts
- FIRST_VALUE() and LAST_VALUE()
- The ROWS and RANGE Frame Clause
- Real-World Use Cases
- Performance Considerations
- Common Mistakes
- Interview Questions
- Wrapping Up
What Are Window Functions?
A window function performs a calculation across a set of rows related to the current row without collapsing those rows into a single output. Unlike GROUP BY which reduces 100 rows into 10 groups, window functions keep all 100 rows and add a calculated column.
GROUP BY (aggregation): Reduces rows
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;
-- Result: 1 row per department
-- department | avg_salary
-- Sales | 65000
-- Engineering| 85000
Window function: Keeps all rows, adds a column
SELECT name, department, salary,
AVG(salary) OVER(PARTITION BY department) as dept_avg
FROM employees;
-- Result: Every employee row preserved, with department average added
-- name | department | salary | dept_avg
-- Alice | Sales | 60000 | 65000
-- Bob | Sales | 70000 | 65000
-- Carol | Engineering | 90000 | 85000
-- Dave | Engineering | 80000 | 85000
Every row keeps its identity. The window function adds context without losing detail.
The OVER() Clause: The Foundation of Everything
Every window function uses the OVER() clause. This is what makes it a window function instead of a regular aggregate.
function_name() OVER (
[PARTITION BY column1, column2]
[ORDER BY column3]
[ROWS/RANGE frame_clause]
)
The three parts of OVER():
| Part | What It Does | Required? |
|---|---|---|
PARTITION BY |
Divides rows into groups (like GROUP BY but keeps rows) | Optional |
ORDER BY |
Sorts rows within each partition | Required for ranking and LAG/LEAD |
ROWS/RANGE |
Defines the window frame (which rows to include in calculation) | Optional (has defaults) |
Empty OVER(): SUM(salary) OVER() calculates the sum across ALL rows.
With PARTITION BY: SUM(salary) OVER(PARTITION BY department) calculates the sum per department.
With ORDER BY: SUM(salary) OVER(ORDER BY hire_date) calculates a running total ordered by hire date.
PARTITION BY vs GROUP BY
This is the most important distinction to understand:
-- GROUP BY: Collapses rows into groups (loses individual rows)
SELECT department, COUNT(*) as emp_count
FROM employees
GROUP BY department;
-- Returns 3 rows (one per department)
-- PARTITION BY: Keeps all rows, adds group calculation
SELECT name, department, salary,
COUNT(*) OVER(PARTITION BY department) as dept_count
FROM employees;
-- Returns ALL employee rows, each with their department count
When to use GROUP BY: When you want summary data (totals, averages, counts per group).
When to use PARTITION BY: When you want summary data added to each detail row.
ORDER BY Inside OVER()
ORDER BY in the OVER() clause does two things:
- Sorts rows within each partition for ranking functions
- Defines the running frame for aggregate functions (changes SUM from total to running total)
-- Without ORDER BY: total sum of entire partition
SELECT name, salary,
SUM(salary) OVER() as total_salary
FROM employees;
-- Every row shows the same total: 300000
-- With ORDER BY: running sum
SELECT name, salary,
SUM(salary) OVER(ORDER BY hire_date) as running_total
FROM employees;
-- Each row shows cumulative sum up to that point:
-- Alice | 60000 | 60000
-- Bob | 70000 | 130000
-- Carol | 90000 | 220000
-- Dave | 80000 | 300000
ROW_NUMBER(): Assign Sequential Numbers
What It Does
Assigns a unique sequential number to each row within a partition, starting from 1. No ties — even if values are identical, each row gets a different number.
Syntax
ROW_NUMBER() OVER(
[PARTITION BY column]
ORDER BY column
)
Example: Number Employees by Salary
SELECT name, department, salary,
ROW_NUMBER() OVER(ORDER BY salary DESC) as row_num
FROM employees;
-- name | department | salary | row_num
-- Carol | Engineering | 90000 | 1
-- Dave | Engineering | 80000 | 2
-- Bob | Sales | 70000 | 3
-- Alice | Sales | 60000 | 4
Example: Number Within Each Department
SELECT name, department, salary,
ROW_NUMBER() OVER(
PARTITION BY department
ORDER BY salary DESC
) as dept_rank
FROM employees;
-- name | department | salary | dept_rank
-- Carol | Engineering | 90000 | 1
-- Dave | Engineering | 80000 | 2
-- Bob | Sales | 70000 | 1
-- Alice | Sales | 60000 | 2
The #1 Use Case: Get the Latest/Top Record per Group
This is the most common interview question involving window functions:
“Get the most recent order for each customer”
WITH ranked AS (
SELECT customer_id, order_id, order_date, amount,
ROW_NUMBER() OVER(
PARTITION BY customer_id
ORDER BY order_date DESC
) as rn
FROM orders
)
SELECT customer_id, order_id, order_date, amount
FROM ranked
WHERE rn = 1;
This pattern works for: latest record per customer, highest salary per department, most recent login per user, top scorer per team.
RANK(): Handle Ties with Gaps
What It Does
Like ROW_NUMBER but handles ties. When two rows have the same value, they get the same rank. The next rank skips the tied positions.
Example
SELECT name, salary,
RANK() OVER(ORDER BY salary DESC) as rank
FROM employees;
-- name | salary | rank
-- Carol | 90000 | 1
-- Dave | 80000 | 2
-- Bob | 70000 | 3
-- Eve | 70000 | 3 <-- tied with Bob
-- Alice | 60000 | 5 <-- skips 4 (because two people tied at 3)
Notice: Rank goes 1, 2, 3, 3, 5 — it skips 4 because two people shared rank 3.
DENSE_RANK(): Handle Ties Without Gaps
What It Does
Like RANK but no gaps. Tied rows get the same rank, and the next rank is the next consecutive number.
Example
SELECT name, salary,
DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;
-- name | salary | dense_rank
-- Carol | 90000 | 1
-- Dave | 80000 | 2
-- Bob | 70000 | 3
-- Eve | 70000 | 3 <-- tied with Bob
-- Alice | 60000 | 4 <-- no gap, continues from 3
Notice: Dense rank goes 1, 2, 3, 3, 4 — no gap after the tie.
ROW_NUMBER vs RANK vs DENSE_RANK: Side-by-Side
SELECT name, salary,
ROW_NUMBER() OVER(ORDER BY salary DESC) as row_num,
RANK() OVER(ORDER BY salary DESC) as rank,
DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;
-- name | salary | row_num | rank | dense_rank
-- Carol | 90000 | 1 | 1 | 1
-- Dave | 80000 | 2 | 2 | 2
-- Bob | 70000 | 3 | 3 | 3
-- Eve | 70000 | 4 | 3 | 3
-- Alice | 60000 | 5 | 5 | 4
| Function | Ties? | Gaps? | Use When |
|---|---|---|---|
| ROW_NUMBER | No ties (arbitrary) | No gaps | Need unique row numbers, deduplication |
| RANK | Yes (same rank) | Yes (skips) | Competition ranking (1st, 2nd, tied 3rd, 5th) |
| DENSE_RANK | Yes (same rank) | No (consecutive) | Need consecutive ranks with ties |
NTILE(): Divide Rows into Buckets
What It Does
Divides rows into a specified number of approximately equal groups (buckets) and assigns a bucket number to each row.
Example: Divide Employees into 4 Salary Quartiles
SELECT name, salary,
NTILE(4) OVER(ORDER BY salary) as quartile
FROM employees;
-- name | salary | quartile
-- Alice | 50000 | 1 (bottom 25%)
-- Bob | 60000 | 1
-- Carol | 70000 | 2
-- Dave | 80000 | 2
-- Eve | 85000 | 3
-- Frank | 90000 | 3
-- Grace | 95000 | 4 (top 25%)
-- Henry | 100000 | 4
Use Cases
- Quartile analysis: Divide customers into quartiles by revenue
- Percentile buckets: Split students into top 10%, next 20%, etc.
- Load balancing: Distribute records evenly across processing batches
LAG(): Access the Previous Row
What It Does
Returns the value of a column from a previous row in the partition, based on the ORDER BY. Lets you compare the current row with the row before it.
Syntax
LAG(column, offset, default) OVER(
[PARTITION BY column]
ORDER BY column
)
column— which column to retrieve from the previous rowoffset— how many rows back (default: 1)default— value to return if there is no previous row (default: NULL)
Example: Compare Each Month Sales with Previous Month
SELECT month, revenue,
LAG(revenue, 1, 0) OVER(ORDER BY month) as prev_month_revenue,
revenue - LAG(revenue, 1, 0) OVER(ORDER BY month) as month_over_month_change
FROM monthly_sales;
-- month | revenue | prev_month_revenue | month_over_month_change
-- 2026-01 | 50000 | 0 | 50000 (no previous, default 0)
-- 2026-02 | 55000 | 50000 | 5000 (+5000 growth)
-- 2026-03 | 48000 | 55000 | -7000 (-7000 decline)
-- 2026-04 | 62000 | 48000 | 14000 (+14000 growth)
Example: Calculate Days Between Orders per Customer
SELECT customer_id, order_date,
LAG(order_date) OVER(
PARTITION BY customer_id
ORDER BY order_date
) as previous_order_date,
DATEDIFF(day,
LAG(order_date) OVER(PARTITION BY customer_id ORDER BY order_date),
order_date
) as days_since_last_order
FROM orders;
LEAD(): Access the Next Row
What It Does
The opposite of LAG. Returns the value from the next row in the partition.
Syntax
LEAD(column, offset, default) OVER(
[PARTITION BY column]
ORDER BY column
)
Example: Show Next Month Revenue
SELECT month, revenue,
LEAD(revenue) OVER(ORDER BY month) as next_month_revenue
FROM monthly_sales;
-- month | revenue | next_month_revenue
-- 2026-01 | 50000 | 55000
-- 2026-02 | 55000 | 48000
-- 2026-03 | 48000 | 62000
-- 2026-04 | 62000 | NULL (no next row)
LAG vs LEAD
LAG = look BACKWARD (previous row)
LEAD = look FORWARD (next row)
Both are essential for time-series analysis, calculating differences between consecutive rows, and detecting trends.
SUM() OVER(): Running Totals
Without ORDER BY: Partition Total
SELECT name, department, salary,
SUM(salary) OVER(PARTITION BY department) as dept_total
FROM employees;
-- Every employee in the same department sees the same total
With ORDER BY: Running Total
SELECT name, salary, hire_date,
SUM(salary) OVER(ORDER BY hire_date) as running_total
FROM employees;
-- name | salary | hire_date | running_total
-- Alice | 60000 | 2024-01-15 | 60000
-- Bob | 70000 | 2024-03-20 | 130000
-- Carol | 90000 | 2024-06-10 | 220000
-- Dave | 80000 | 2025-01-05 | 300000
Running Total per Department
SELECT name, department, salary, hire_date,
SUM(salary) OVER(
PARTITION BY department
ORDER BY hire_date
) as dept_running_total
FROM employees;
AVG() OVER(): Moving Averages
3-Month Moving Average
SELECT month, revenue,
AVG(revenue) OVER(
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as moving_avg_3month
FROM monthly_sales;
-- month | revenue | moving_avg_3month
-- 2026-01 | 50000 | 50000 (only 1 month available)
-- 2026-02 | 55000 | 52500 (avg of Jan, Feb)
-- 2026-03 | 48000 | 51000 (avg of Jan, Feb, Mar)
-- 2026-04 | 62000 | 55000 (avg of Feb, Mar, Apr)
The ROWS BETWEEN 2 PRECEDING AND CURRENT ROW means: include the current row and the 2 rows before it.
COUNT() OVER(): Running Counts
SELECT name, department,
COUNT(*) OVER(PARTITION BY department) as dept_size,
COUNT(*) OVER(ORDER BY hire_date) as cumulative_hires
FROM employees;
FIRST_VALUE() and LAST_VALUE()
FIRST_VALUE(): Get the First Row Value in the Window
SELECT name, department, salary,
FIRST_VALUE(name) OVER(
PARTITION BY department
ORDER BY salary DESC
) as highest_paid_in_dept
FROM employees;
-- Every employee sees who earns the most in their department
LAST_VALUE(): Get the Last Row Value
SELECT name, department, salary,
LAST_VALUE(name) OVER(
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as lowest_paid_in_dept
FROM employees;
Important: LAST_VALUE requires the frame clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Without it, the default frame only goes up to the current row, so LAST_VALUE just returns the current row (not useful).
The ROWS and RANGE Frame Clause
The frame clause defines which rows are included in the window calculation:
ROWS BETWEEN start AND end
| Frame | Meaning |
|---|---|
UNBOUNDED PRECEDING |
First row of the partition |
N PRECEDING |
N rows before current |
CURRENT ROW |
The current row |
N FOLLOWING |
N rows after current |
UNBOUNDED FOLLOWING |
Last row of the partition |
Common Frame Patterns
-- Running total (default when ORDER BY is present)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Entire partition
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
-- 3-row moving average
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
-- Last 3 rows including current
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
Default Behavior
- No ORDER BY: Frame is the entire partition
- With ORDER BY: Frame is
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(running calculation)
This is why SUM() OVER(ORDER BY date) gives a running total — the default frame includes all rows from the start up to the current row.
Real-World Use Cases
1. Deduplication (Keep Latest Record)
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER(
PARTITION BY email
ORDER BY updated_at DESC
) as rn
FROM users
)
DELETE FROM ranked WHERE rn > 1;
2. Year-over-Year Growth
SELECT year, revenue,
LAG(revenue) OVER(ORDER BY year) as prev_year,
ROUND(
(revenue - LAG(revenue) OVER(ORDER BY year)) * 100.0
/ LAG(revenue) OVER(ORDER BY year), 2
) as yoy_growth_pct
FROM annual_revenue;
3. Percentile Ranking
SELECT name, salary,
PERCENT_RANK() OVER(ORDER BY salary) as percentile
FROM employees;
-- Shows each employee's salary percentile (0 to 1)
4. Gap Detection (Find Missing Dates)
SELECT order_date,
LEAD(order_date) OVER(ORDER BY order_date) as next_order_date,
DATEDIFF(day, order_date,
LEAD(order_date) OVER(ORDER BY order_date)
) as gap_days
FROM orders
HAVING gap_days > 1;
5. Running Total for Financial Reporting
SELECT transaction_date, amount,
SUM(amount) OVER(ORDER BY transaction_date) as balance
FROM transactions
WHERE account_id = 12345;
6. Top N per Group
-- Top 3 products per category by revenue
WITH ranked AS (
SELECT category, product_name, revenue,
ROW_NUMBER() OVER(
PARTITION BY category
ORDER BY revenue DESC
) as rn
FROM products
)
SELECT * FROM ranked WHERE rn <= 3;
Performance Considerations
-
Window functions execute AFTER WHERE, GROUP BY, and HAVING. They operate on the result set, not the raw table. Add filters early to reduce the data size.
-
PARTITION BY benefits from indexes. If you partition by
department, an index ondepartmenthelps. -
ORDER BY in OVER() can be expensive. Each partition needs to be sorted. On large datasets, this matters.
-
Multiple window functions with the same OVER() clause share the sort. SQL Server and PostgreSQL optimize this — define identical OVER() clauses to take advantage of it.
-- GOOD: Same OVER() clause reused (one sort operation)
SELECT name, salary,
ROW_NUMBER() OVER(ORDER BY salary DESC) as rn,
RANK() OVER(ORDER BY salary DESC) as rnk,
DENSE_RANK() OVER(ORDER BY salary DESC) as drnk
FROM employees;
- Use CTEs with window functions instead of subqueries. CTEs are more readable and the optimizer often handles them the same way.
Common Mistakes
1. Using Window Functions in WHERE
-- WRONG: Cannot use window function in WHERE
SELECT * FROM employees
WHERE ROW_NUMBER() OVER(ORDER BY salary DESC) <= 5;
-- RIGHT: Use a CTE or subquery
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY salary DESC) as rn
FROM employees
)
SELECT * FROM ranked WHERE rn <= 5;
2. Forgetting ORDER BY for Ranking Functions
-- WRONG: ROW_NUMBER without ORDER BY (results are unpredictable)
ROW_NUMBER() OVER(PARTITION BY department)
-- RIGHT: Always specify ORDER BY
ROW_NUMBER() OVER(PARTITION BY department ORDER BY salary DESC)
3. LAST_VALUE Without Frame Clause
-- WRONG: Returns current row value (not the actual last value)
LAST_VALUE(name) OVER(PARTITION BY dept ORDER BY salary)
-- RIGHT: Include full frame
LAST_VALUE(name) OVER(
PARTITION BY dept ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
4. Confusing PARTITION BY with GROUP BY
PARTITION BY keeps all rows. GROUP BY collapses rows. They are not interchangeable.
Interview Questions
Q: What is the difference between ROW_NUMBER, RANK, and DENSE_RANK? A: ROW_NUMBER assigns unique sequential numbers (no ties). RANK assigns the same number for ties but skips subsequent numbers (1,2,3,3,5). DENSE_RANK assigns the same number for ties without skipping (1,2,3,3,4).
Q: How do you get the latest record per customer? A: Use ROW_NUMBER with PARTITION BY customer_id ORDER BY date DESC, wrap in a CTE, filter WHERE rn = 1.
Q: What is the difference between LAG and LEAD? A: LAG accesses the previous row value (looks backward). LEAD accesses the next row value (looks forward). Both require ORDER BY in the OVER clause.
Q: How do you calculate a running total in SQL? A: SUM(amount) OVER(ORDER BY date). The ORDER BY creates a default frame from the first row to the current row, producing a cumulative sum.
Q: Can you use a window function in a WHERE clause? A: No. Window functions execute after WHERE. To filter by a window function result, wrap the query in a CTE or subquery and filter in the outer query.
Q: What is PARTITION BY and how is it different from GROUP BY? A: PARTITION BY divides rows into groups for window function calculations but keeps all individual rows. GROUP BY collapses rows into groups and returns one row per group.
Q: How do you calculate month-over-month change? A: Use LAG to get the previous month value, then subtract: revenue – LAG(revenue) OVER(ORDER BY month).
Q: What is a moving average and how do you calculate it in SQL? A: A moving average smooths data by averaging a fixed number of recent values. Use AVG with a frame clause: AVG(value) OVER(ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) for a 3-period moving average.
Wrapping Up
Window functions are the most powerful feature in modern SQL. They let you:
- Rank rows without losing detail (ROW_NUMBER, RANK, DENSE_RANK)
- Compare with previous/next rows without self-joins (LAG, LEAD)
- Aggregate without collapsing rows (SUM, AVG, COUNT with OVER)
- Analyze trends with running totals and moving averages
Master these functions and you can solve 90% of SQL interview questions and write cleaner, faster queries in production.
Related posts: – Top 15 ADF Interview Questions – Parquet vs CSV vs JSON – Schema-on-Write vs Schema-on-Read
If this guide helped you understand window functions, share it with someone preparing for SQL interviews. Questions? Drop a comment below.
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.