SQL GROUP BY, Aggregations, HAVING, CASE WHEN, and Null Handling: The Complete Guide with Real-Life Analogies

SQL GROUP BY, Aggregations, HAVING, CASE WHEN, and Null Handling: The Complete Guide with Real-Life Analogies

In the previous post, we learned how to filter individual rows with WHERE, BETWEEN, IN, LIKE, and EXISTS. But filtering is only half the story. The other half is AGGREGATION — summarizing thousands of rows into meaningful numbers.

“How many employees are in each department?” “What is the average salary?” “Which department spends the most on payroll?” These questions do not need individual rows — they need GROUP BY and aggregate functions.

This post covers the aggregation toolkit: GROUP BY to organize, aggregate functions to summarize, HAVING to filter the summaries, CASE WHEN to add conditional logic, and COALESCE to handle NULLs. Every query that produces a report, a dashboard metric, or a KPI uses these tools.

Think of GROUP BY like sorting mail at a post office. A mountain of letters (rows) arrives. You sort them into mailboxes (groups) by zip code (GROUP BY column). Then you count letters per mailbox (COUNT), weigh the heaviest one (MAX), or total the postage (SUM). You cannot ask “which specific letter is in each mailbox” without opening it — that is why SELECT columns must be in GROUP BY or inside an aggregate.

Table of Contents

  • GROUP BY — Organizing Rows into Groups
  • The GROUP BY Rule: Every Column in SELECT
  • Multiple Column GROUP BY
  • Aggregate Functions: COUNT, SUM, AVG, MIN, MAX
  • COUNT(*) vs COUNT(column) vs COUNT(DISTINCT)
  • SUM and AVG with NULL Handling
  • MIN and MAX on Different Data Types
  • HAVING — Filtering Groups After Aggregation
  • WHERE vs HAVING: The Critical Difference
  • Combining WHERE and HAVING
  • CASE WHEN — Conditional Logic
  • CASE WHEN in SELECT
  • CASE WHEN in WHERE
  • CASE WHEN in ORDER BY
  • CASE WHEN in Aggregations (Pivot Pattern)
  • Nested CASE WHEN
  • COALESCE — First Non-NULL Value
  • NULLIF — Convert Value to NULL
  • ISNULL / IFNULL — Replace NULL
  • Division by Zero Protection
  • IIF — Shorthand CASE (SQL Server)
  • Aliases — Naming Your Columns
  • Column Aliases
  • Table Aliases
  • Alias Scope and Execution Order
  • Advanced Aggregation Patterns
  • Grouping with Expressions
  • Conditional Aggregation (Crosstab)
  • Running Aggregation Across ALL Rows
  • STRING_AGG / GROUP_CONCAT
  • Common Mistakes
  • Interview Questions
  • Wrapping Up

GROUP BY — Organizing Rows into Groups

GROUP BY collapses multiple rows into ONE row per group:

-- Without GROUP BY: 10 individual rows
SELECT department, salary FROM employees;
-- 10 rows

-- With GROUP BY: 3 summary rows (one per department)
SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
department emp_count avg_salary
Analytics 3 87000
Engineering 3 97333
Sales 4 78250

10 rows collapsed into 3. Each row represents a GROUP, not an individual.

Real-life analogy: GROUP BY is like a teacher organizing a class photo. Instead of 30 individual photos (rows), the teacher says “group by house” (GROUP BY house). Now there are 4 group photos (Gryffindor, Slytherin, etc.). You can count students per house (COUNT), but you cannot see individual faces without zooming in (you need the column in GROUP BY or an aggregate).

The GROUP BY Rule: Every Column in SELECT

Any column in SELECT that is NOT inside an aggregate function MUST be in GROUP BY. This is the most common GROUP BY error:

-- ❌ FAILS: first_name is not aggregated and not in GROUP BY
SELECT department, first_name, COUNT(*)
FROM employees
GROUP BY department;
-- ERROR: first_name is invalid because it is not in GROUP BY

-- WHY: Engineering has 3 employees (Naveen, Shrey, Vrushab)
-- GROUP BY collapses them into ONE row. Which first_name should SQL pick?
-- It cannot decide — hence the error.

-- ✅ FIX Option 1: Add first_name to GROUP BY (creates more groups)
SELECT department, first_name, COUNT(*)
FROM employees
GROUP BY department, first_name;
-- Now each combination (Engineering+Naveen, Engineering+Shrey) is its own group

-- ✅ FIX Option 2: Aggregate first_name
SELECT department, COUNT(*) AS emp_count, MAX(first_name) AS sample_name
FROM employees
GROUP BY department;
-- MAX picks one name alphabetically — not ideal but valid

-- ✅ FIX Option 3: Use STRING_AGG to list all names
SELECT department, COUNT(*) AS emp_count, STRING_AGG(first_name, ', ') AS names
FROM employees
GROUP BY department;
-- Engineering | 3 | Naveen, Shrey, Vrushab

Multiple Column GROUP BY

-- Group by department AND country
SELECT department, country, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department, country
ORDER BY department, country;
department country emp_count avg_salary
Analytics Canada 3 87000
Engineering Canada 3 97333
Sales India 4 78250

Each unique combination of department + country becomes a group.

Real-life analogy: Single GROUP BY is sorting mail by country. Two-column GROUP BY is sorting by country AND city. Toronto/Canada, Mumbai/India, Delhi/India are different mailboxes even though Mumbai and Delhi are both India.

Aggregate Functions: COUNT, SUM, AVG, MIN, MAX

COUNT — Counting Rows

-- COUNT(*): counts ALL rows (including NULLs)
SELECT COUNT(*) AS total_employees FROM employees;
-- 10

-- COUNT(column): counts NON-NULL values only
SELECT COUNT(email) AS has_email FROM employees;
-- 9 (Kavya's NULL email is not counted)

-- COUNT(DISTINCT column): counts unique non-NULL values
SELECT COUNT(DISTINCT department) AS dept_count FROM employees;
-- 3

SELECT COUNT(DISTINCT city) AS city_count FROM employees;
-- 7 (NULL is not counted by COUNT)

The difference matters:

SELECT
    COUNT(*) AS total_rows,            -- 10 (counts everything)
    COUNT(email) AS with_email,        -- 9 (skips NULLs)
    COUNT(DISTINCT city) AS unique_cities  -- 7 (unique, no NULLs)
FROM employees;

SUM — Total

-- Total payroll
SELECT SUM(salary) AS total_payroll FROM employees;
-- 866000

-- Total per department
SELECT department, SUM(salary) AS dept_payroll
FROM employees GROUP BY department;

-- SUM with NULL: NULLs are ignored (not treated as 0)
-- If salary has NULL values, SUM skips them

AVG — Average

-- Company average salary
SELECT AVG(salary) AS avg_salary FROM employees;
-- 86600

-- Average per department
SELECT department, ROUND(AVG(salary), 0) AS avg_salary
FROM employees GROUP BY department;

-- ⚠️ AVG ignores NULLs
-- Table with salaries: 100, 200, NULL
-- AVG = (100 + 200) / 2 = 150 (NOT 100 — NULL is skipped, not zero)

Real-life analogy: If you ask “what is the average test score?” and 3 students scored 80, 90, NULL (absent), the average is (80+90)/2 = 85, not (80+90+0)/3 = 56.7. SQL treats absent (NULL) as “not counted,” not as zero.

MIN and MAX

-- Lowest and highest salary
SELECT MIN(salary) AS lowest, MAX(salary) AS highest FROM employees;
-- 75000, 105000

-- Earliest and latest hire
SELECT MIN(hire_date) AS first_hire, MAX(hire_date) AS latest_hire FROM employees;
-- 2022-01-15, 2024-05-01

-- MIN/MAX on strings (alphabetical)
SELECT MIN(first_name) AS first_alpha, MAX(first_name) AS last_alpha FROM employees;
-- Anita, Vrushab

Combined Aggregations

SELECT
    department,
    COUNT(*) AS headcount,
    SUM(salary) AS total_payroll,
    ROUND(AVG(salary), 0) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary,
    MAX(salary) - MIN(salary) AS salary_spread
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
department headcount total_payroll avg_salary min_salary max_salary salary_spread
Engineering 3 292000 97333 92000 105000 13000
Analytics 3 261000 87000 85000 91000 6000
Sales 4 313000 78250 75000 82000 7000

HAVING — Filtering Groups After Aggregation

HAVING filters GROUPS (after GROUP BY). WHERE filters ROWS (before GROUP BY).

-- Departments with more than 3 employees
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 3;
-- Only Sales (4 employees). Engineering(3) and Analytics(3) are excluded.

-- Departments with average salary above 85K
SELECT department, ROUND(AVG(salary), 0) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 85000;
-- Engineering(97333) and Analytics(87000). Sales(78250) excluded.

-- Departments with total payroll between 250K and 300K
SELECT department, SUM(salary) AS total_payroll
FROM employees
GROUP BY department
HAVING SUM(salary) BETWEEN 250000 AND 300000;
-- Engineering(292000) and Analytics(261000)

WHERE vs HAVING: The Critical Difference

-- WHERE: "Show me employees earning over 80K, THEN count per department"
SELECT department, COUNT(*) AS high_earners
FROM employees
WHERE salary > 80000            -- Filter ROWS first (removes 4 employees)
GROUP BY department;             -- Group the remaining 6

-- HAVING: "Count ALL employees per department, THEN show departments with avg > 80K"
SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department              -- Group all 10 employees
HAVING AVG(salary) > 80000;     -- Filter GROUPS after aggregation

Combined WHERE + HAVING:

-- "Among employees hired after 2023, which departments have more than 2 such employees?"
SELECT department, COUNT(*) AS recent_hires
FROM employees
WHERE hire_date >= '2023-01-01'      -- Step 1: Keep only recent hires (8 employees)
GROUP BY department                   -- Step 2: Group them (Eng:2, Ana:3, Sales:3?)
HAVING COUNT(*) > 2;                  -- Step 3: Keep groups with > 2

Real-life analogy: WHERE is like filtering students BEFORE dividing into groups: “Only students who passed the exam, THEN divide by major.” HAVING is filtering AFTER dividing: “Divide by major, THEN only show majors with average GPA above 3.5.” WHERE removes individuals. HAVING removes entire groups.

CASE WHEN — Conditional Logic

CASE WHEN is SQL’s if/else. It evaluates conditions top to bottom and returns the value for the first matching condition.

CASE WHEN in SELECT

-- Salary bands
SELECT first_name, salary,
    CASE
        WHEN salary >= 100000 THEN 'Executive'
        WHEN salary >= 90000 THEN 'Senior'
        WHEN salary >= 80000 THEN 'Mid-Level'
        ELSE 'Junior'
    END AS salary_band
FROM employees
ORDER BY salary DESC;
first_name salary salary_band
Naveen 105000 Executive
Shrey 95000 Senior
Vrushab 92000 Senior
Vishnu 91000 Senior
Ravi 85000 Mid-Level
Priya 85000 Mid-Level
Deepak 82000 Mid-Level
Anita 78000 Junior
Kavya 78000 Junior
Manoj 75000 Junior

Real-life analogy: CASE WHEN is a traffic signal with multiple lights. The car (row) approaches and checks each signal in order. Red (salary >= 100K) → stop at Executive. Yellow (>= 90K) → Senior. Green (>= 80K) → Mid-Level. Default (ELSE) → Junior. First matching light wins.

CASE WHEN in WHERE

-- Filter differently based on department
SELECT * FROM employees
WHERE CASE
    WHEN department = 'Engineering' THEN salary > 90000
    WHEN department = 'Sales' THEN salary > 75000
    ELSE salary > 80000
END = 1;
-- Engineers: only > 90K. Sales: only > 75K. Others: only > 80K.

CASE WHEN in ORDER BY

-- Custom sort: Engineering first, then Analytics, then Sales
SELECT first_name, department FROM employees
ORDER BY
    CASE department
        WHEN 'Engineering' THEN 1
        WHEN 'Analytics' THEN 2
        WHEN 'Sales' THEN 3
        ELSE 4
    END;

CASE WHEN in Aggregations (Pivot Pattern)

This is one of the most powerful patterns — turning rows into columns:

-- Count employees per city within each department
SELECT department,
    SUM(CASE WHEN city = 'Toronto' THEN 1 ELSE 0 END) AS toronto,
    SUM(CASE WHEN city = 'Mumbai' THEN 1 ELSE 0 END) AS mumbai,
    SUM(CASE WHEN city = 'Delhi' THEN 1 ELSE 0 END) AS delhi,
    SUM(CASE WHEN city NOT IN ('Toronto', 'Mumbai', 'Delhi') OR city IS NULL THEN 1 ELSE 0 END) AS other
FROM employees
GROUP BY department;
department toronto mumbai delhi other
Engineering 2 0 0 1
Analytics 1 0 0 2
Sales 0 1 1 2

This is called conditional aggregation — a poor man’s PIVOT. It turns row values into column headers using CASE WHEN inside SUM.

Nested CASE WHEN

SELECT first_name, department, salary,
    CASE
        WHEN department = 'Engineering' THEN
            CASE
                WHEN salary >= 100000 THEN 'Lead Engineer'
                WHEN salary >= 90000 THEN 'Senior Engineer'
                ELSE 'Engineer'
            END
        WHEN department = 'Sales' THEN
            CASE
                WHEN salary >= 80000 THEN 'Senior Sales'
                ELSE 'Sales Associate'
            END
        ELSE 'Staff'
    END AS role_title
FROM employees;

COALESCE — First Non-NULL Value

COALESCE returns the first non-NULL value from a list of arguments:

-- Replace NULL email with phone, or 'No Contact'
SELECT first_name,
    COALESCE(email, 'no-email@placeholder.com') AS contact_email
FROM employees;
-- Kavya's NULL email → 'no-email@placeholder.com'

-- Chain multiple fallbacks
SELECT first_name,
    COALESCE(email, city, 'Unknown') AS best_contact_info
FROM employees;
-- Try email first. If NULL, try city. If both NULL, use 'Unknown'.

-- COALESCE in calculations
SELECT first_name,
    salary + COALESCE(bonus, 0) AS total_compensation
FROM employees;
-- If bonus is NULL, treat it as 0 (not NULL, which would make the entire sum NULL)

Real-life analogy: COALESCE is like a phone tree during an emergency. Call the manager first. If they do not answer (NULL), call the supervisor. If they do not answer, call the department head. First person who answers is the result.

NULLIF — Convert Value to NULL

NULLIF returns NULL if two values are equal:

-- Convert 0 bonus to NULL (useful for averages)
SELECT first_name, NULLIF(bonus, 0) AS real_bonus FROM employees;
-- If bonus = 0, returns NULL instead (so AVG ignores it)

-- Division by zero protection (most important use)
SELECT name, revenue / NULLIF(cost, 0) AS margin_ratio FROM products;
-- If cost = 0, NULLIF returns NULL, and revenue / NULL = NULL (no error!)
-- Without NULLIF: revenue / 0 = DIVISION BY ZERO ERROR

Division by Zero Protection

-- ❌ CRASHES if hours_worked = 0
SELECT name, salary / hours_worked AS hourly_rate FROM employees;

-- ✅ SAFE: Returns NULL instead of crashing
SELECT name, salary / NULLIF(hours_worked, 0) AS hourly_rate FROM employees;

-- ✅ ALSO SAFE: Return 0 instead of NULL
SELECT name,
    CASE WHEN hours_worked = 0 THEN 0
         ELSE salary / hours_worked
    END AS hourly_rate
FROM employees;

IIF — Shorthand CASE (SQL Server)

-- SQL Server only: IIF(condition, true_value, false_value)
SELECT first_name, salary,
    IIF(salary >= 90000, 'Senior', 'Junior') AS level
FROM employees;

-- Equivalent CASE:
SELECT first_name, salary,
    CASE WHEN salary >= 90000 THEN 'Senior' ELSE 'Junior' END AS level
FROM employees;

IIF is shorter but only works in SQL Server. CASE WHEN works everywhere.

Aliases — Naming Your Columns

Column Aliases

-- With AS
SELECT first_name AS "Employee Name", salary AS "Monthly Salary" FROM employees;

-- Without AS (SQL Server allows this)
SELECT first_name "Employee Name", salary "Monthly Salary" FROM employees;

-- Alias for expressions
SELECT first_name, salary * 12 AS annual_salary FROM employees;

Table Aliases

-- Long way
SELECT employees.first_name, departments.department_name
FROM employees JOIN departments ON employees.dept_id = departments.dept_id;

-- Short way with aliases
SELECT e.first_name, d.department_name
FROM employees e JOIN departments d ON e.dept_id = d.dept_id;

Alias Scope and Execution Order

-- ❌ Cannot use alias in WHERE (WHERE runs before SELECT)
SELECT salary * 12 AS annual FROM employees WHERE annual > 100000;

-- ✅ Can use alias in ORDER BY (ORDER BY runs after SELECT)
SELECT salary * 12 AS annual FROM employees ORDER BY annual DESC;

-- ✅ Can use alias in HAVING (in some databases — not all)
-- SQL Server: HAVING annual > 100000 MAY work
-- Best practice: repeat the expression
SELECT department, AVG(salary * 12) AS avg_annual
FROM employees GROUP BY department
HAVING AVG(salary * 12) > 1000000;

Advanced Aggregation Patterns

Grouping with Expressions

-- Group by hire year
SELECT YEAR(hire_date) AS hire_year, COUNT(*) AS hires
FROM employees
GROUP BY YEAR(hire_date)
ORDER BY hire_year;
hire_year hires
2022 1
2023 5
2024 4

Conditional Aggregation (Crosstab)

-- Department summary with conditional metrics
SELECT department,
    COUNT(*) AS total,
    SUM(CASE WHEN salary >= 90000 THEN 1 ELSE 0 END) AS senior_count,
    SUM(CASE WHEN salary < 90000 THEN 1 ELSE 0 END) AS junior_count,
    ROUND(AVG(CASE WHEN country = 'Canada' THEN salary END), 0) AS canada_avg,
    ROUND(AVG(CASE WHEN country = 'India' THEN salary END), 0) AS india_avg
FROM employees
GROUP BY department;

STRING_AGG — Concatenate Values in a Group

-- SQL Server 2017+ / PostgreSQL
SELECT department, STRING_AGG(first_name, ', ') AS team_members
FROM employees
GROUP BY department;
department team_members
Engineering Naveen, Shrey, Vrushab
Analytics Vishnu, Ravi, Priya
Sales Anita, Deepak, Kavya, Manoj

-- With ordering inside the aggregation
SELECT department, STRING_AGG(first_name, ', ') WITHIN GROUP (ORDER BY salary DESC) AS team_by_salary
FROM employees
GROUP BY department;

Common Mistakes

  1. Selecting non-aggregated columns without GROUP BY — every column in SELECT must be in GROUP BY or inside an aggregate. No exceptions.

  2. Using WHERE instead of HAVING for aggregatesWHERE COUNT(*) > 5 fails because WHERE runs before GROUP BY. Use HAVING.

  3. Forgetting AVG ignores NULLs — AVG of (100, 200, NULL) = 150, not 100. If you want NULLs treated as zero, use AVG(COALESCE(column, 0)).

  4. Division by zero without NULLIFsalary / hours crashes if hours = 0. Always use salary / NULLIF(hours, 0).

  5. Using COUNT(column) when you mean COUNT(*) — COUNT(column) skips NULLs. If you want total rows, use COUNT(*).

  6. CASE WHEN without ELSE — missing ELSE returns NULL for unmatched rows. Always include ELSE for safety.

Interview Questions

Q: What is the difference between WHERE and HAVING? A: WHERE filters individual rows BEFORE GROUP BY. HAVING filters groups AFTER aggregation. WHERE cannot use aggregate functions (COUNT, SUM, AVG). HAVING can only filter on aggregated values or GROUP BY columns.

Q: What is the difference between COUNT(*) and COUNT(column)? A: COUNT(*) counts all rows including NULLs. COUNT(column) counts only non-NULL values in that column. COUNT(DISTINCT column) counts unique non-NULL values.

Q: How does COALESCE work? A: COALESCE returns the first non-NULL value from a list of arguments. COALESCE(email, phone, 'N/A') returns email if not NULL, otherwise phone, otherwise ‘N/A’. It is the standard way to provide default values for NULLs.

Q: How do you handle division by zero in SQL? A: Use NULLIF: revenue / NULLIF(cost, 0). If cost is 0, NULLIF converts it to NULL, and dividing by NULL returns NULL instead of crashing. Wrap in COALESCE if you want 0 instead of NULL.

Q: How do you create a crosstab (pivot) without PIVOT syntax? A: Use CASE WHEN inside SUM: SUM(CASE WHEN city = 'Toronto' THEN 1 ELSE 0 END) AS toronto. Each CASE creates a column. This is called conditional aggregation and works in all SQL databases.

Wrapping Up

GROUP BY and aggregate functions are the tools that turn raw data into business insights. COUNT tells you “how many.” SUM tells you “how much.” AVG tells you “what is typical.” MIN/MAX tell you “what are the extremes.” HAVING tells you “which groups matter.” And CASE WHEN adds intelligence to every query.

Master these, and you can answer any “how many,” “how much,” or “what is the average” question a business stakeholder throws at you.

Previous post: SQL Execution Order, SELECT, WHERE, and Filtering Clauses

Next post: Subqueries, Correlated Subqueries, and Joins vs Subqueries Performance

Related posts:SQL JoinsSQL Window FunctionsCTEs and Subqueries


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link