SQL Subqueries, Correlated Subqueries, EXISTS, and Joins vs Subqueries: When to Use Which and Why Performance Matters
You have mastered filtering with WHERE and aggregating with GROUP BY. Now comes the power tool that separates intermediate SQL from advanced SQL: subqueries.
A subquery is a query inside another query. It lets you ask questions that require two steps: “Find the average salary, THEN find employees earning above it.” You cannot do this in a single flat query because you need the average BEFORE you can filter — and that requires a nested query.
But subqueries come in three flavors, each with different performance characteristics. Use the wrong one on a large table and your 5-second query becomes a 5-minute query. This post teaches you all three types, when each is appropriate, and when to use a JOIN or a Window Function instead.
Think of subqueries like asking a question that requires research. “Which employees earn more than average?” You cannot answer this directly — you first need to RESEARCH the average (inner query), THEN answer the original question (outer query). A regular subquery does the research ONCE. A correlated subquery redoes the research for EVERY employee. EXISTS just checks “does the research find anything?” and stops immediately.
Table of Contents
- What Is a Subquery?
- Three Types of Subqueries (By Location)
- Subquery in WHERE — The Most Common Type
- Subquery in FROM — Derived Tables
- Subquery in SELECT — Scalar Subqueries
- Correlated Subqueries — The Per-Row Query
- How Correlated Subqueries Execute (Step by Step)
- Correlated vs Non-Correlated Comparison
- EXISTS — The Efficient Existence Check
- NOT EXISTS — The NULL-Safe Alternative to NOT IN
- Nested Subqueries — Queries Within Queries Within Queries
- Joins vs Subqueries — Same Result, Different Approach
- The Same Question: Five Different Ways
- Performance Comparison
- When to Use Joins
- When to Use Subqueries (IN)
- When to Use EXISTS
- When to Use Window Functions Instead
- Subqueries with INSERT, UPDATE, DELETE
- INSERT with Subquery
- UPDATE with Subquery
- DELETE with Subquery
- Real-World Patterns
- Pattern 1: “Above Average” Queries
- Pattern 2: “Top N Per Group” with Subquery
- Pattern 3: “Exists in Another Table”
- Pattern 4: “Maximum Per Group” Without Window Functions
- Pattern 5: “Running Comparison” with Correlated Subquery
- The Decision Tree: Which Approach to Use
- Common Mistakes
- Interview Questions
- Wrapping Up
What Is a Subquery?
A subquery is a SELECT statement nested inside another SQL statement. The inner query runs first and feeds its result to the outer query:
-- Outer query: Find employees earning above average
-- Inner query: Calculate the average salary
SELECT first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- This is the subquery (inner query)
The inner query returns 86600 (average salary). The outer query then becomes:
SELECT first_name, salary FROM employees WHERE salary > 86600;
Real-life analogy: A subquery is like a question that requires a calculation first. “Who scored above the class average?” First, calculate the average (inner query). Then, compare each student against it (outer query). You cannot skip the calculation step.
Three Types of Subqueries (By Location)
| Type | Location | Returns | Example |
|---|---|---|---|
| WHERE subquery | Inside WHERE clause | A value or list for filtering | WHERE salary > (SELECT AVG(salary)...) |
| FROM subquery | Inside FROM clause (derived table) | A temporary table to query from | FROM (SELECT dept, AVG(salary)...) AS t |
| SELECT subquery | Inside SELECT clause (scalar) | One value per row | SELECT name, (SELECT MAX(salary)...) AS max |
Subquery in WHERE — The Most Common Type
Single-Value Subquery (Returns One Number)
-- Employees earning above company average
SELECT first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Inner query returns: 86600
-- Outer query: WHERE salary > 86600
-- Result: Naveen(105K), Shrey(95K), Vrushab(92K), Vishnu(91K)
List Subquery (Returns Multiple Values)
-- Employees in departments that are based in Canada
SELECT first_name, department
FROM employees
WHERE department IN (
SELECT DISTINCT department FROM employees WHERE country = 'Canada'
);
-- Inner query returns: ('Engineering', 'Analytics')
-- Outer query: WHERE department IN ('Engineering', 'Analytics')
Multi-Row Comparison (ALL, ANY)
-- Employees earning more than ALL Analytics employees (more than the highest)
SELECT first_name, salary FROM employees
WHERE salary > ALL (SELECT salary FROM employees WHERE department = 'Analytics');
-- Must be > 91000 (max Analytics salary)
-- Result: Naveen(105K), Shrey(95K), Vrushab(92K)
-- Employees earning more than ANY Sales employee (more than the lowest)
SELECT first_name, salary FROM employees
WHERE salary > ANY (SELECT salary FROM employees WHERE department = 'Sales');
-- Must be > 75000 (min Sales salary)
-- Almost everyone qualifies
Real-life analogy: ALL is like “taller than the tallest person in the room” — you need to beat everyone. ANY is like “taller than at least one person in the room” — you just need to beat the shortest.
Subquery in FROM — Derived Tables
The subquery creates a temporary table that the outer query reads from:
-- Find departments where average salary exceeds 85K
SELECT dept_stats.department, dept_stats.avg_salary, dept_stats.headcount
FROM (
SELECT department,
ROUND(AVG(salary), 0) AS avg_salary,
COUNT(*) AS headcount
FROM employees
GROUP BY department
) AS dept_stats
WHERE dept_stats.avg_salary > 85000;
The inner query produces a 3-row table (dept_stats). The outer query filters it.
When to use: When you need to filter or join on aggregated results, and the query is too complex for a single GROUP BY + HAVING.
Real-life analogy: A derived table is like creating a summary report (inner query), then highlighting specific rows on the report (outer query). You first create the report, then analyze it.
Subquery in SELECT — Scalar Subqueries
A scalar subquery in SELECT returns ONE value for each row:
-- Each employee's salary compared to company average
SELECT
first_name,
salary,
(SELECT ROUND(AVG(salary), 0) FROM employees) AS company_avg,
salary - (SELECT ROUND(AVG(salary), 0) FROM employees) AS diff_from_avg
FROM employees
ORDER BY diff_from_avg DESC;
| first_name | salary | company_avg | diff_from_avg |
|---|---|---|---|
| Naveen | 105000 | 86600 | 18400 |
| Shrey | 95000 | 86600 | 8400 |
| Vrushab | 92000 | 86600 | 5400 |
| Vishnu | 91000 | 86600 | 4400 |
| Ravi | 85000 | 86600 | -1600 |
| Priya | 85000 | 86600 | -1600 |
| Deepak | 82000 | 86600 | -4600 |
| Anita | 78000 | 86600 | -8600 |
| Kavya | 78000 | 86600 | -8600 |
| Manoj | 75000 | 86600 | -11600 |
Warning: This scalar subquery runs ONCE (same value for every row). But if you make it correlated (referencing the outer row), it runs PER ROW — much slower.
Correlated Subqueries — The Per-Row Query
A correlated subquery references a column from the OUTER query. It runs ONCE FOR EACH ROW of the outer query.
-- Employees earning more than their department's average
SELECT e.first_name, e.department, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e.department -- ← References outer query's department
);
How Correlated Subqueries Execute (Step by Step)
Row 1: Naveen, Engineering, 105000
→ Inner query: AVG(salary) WHERE department = 'Engineering' → 97333
→ 105000 > 97333? YES → Include Naveen
Row 2: Shrey, Engineering, 95000
→ Inner query: AVG(salary) WHERE department = 'Engineering' → 97333
→ 95000 > 97333? NO → Exclude Shrey
Row 3: Vrushab, Engineering, 92000
→ Inner query: AVG(salary) WHERE department = 'Engineering' → 97333
→ 92000 > 97333? NO → Exclude Vrushab
Row 4: Vishnu, Analytics, 91000
→ Inner query: AVG(salary) WHERE department = 'Analytics' → 87000
→ 91000 > 87000? YES → Include Vishnu
Row 5: Ravi, Analytics, 85000
→ Inner query: AVG(salary) WHERE department = 'Analytics' → 87000
→ 85000 > 87000? NO → Exclude Ravi
...continues for all 10 rows (10 subquery executions)
Result: Naveen (above Engineering avg), Vishnu (above Analytics avg), Deepak (above Sales avg).
The performance cost: The subquery executes 10 times (once per row). On a 10-million row table, it executes 10 MILLION times. This is why correlated subqueries are slow on large tables.
Correlated vs Non-Correlated Comparison
| Feature | Non-Correlated | Correlated |
|---|---|---|
| References outer query? | No | Yes |
| Executes how many times? | Once | Once per outer row |
| Performance | Fast (single execution) | Slow (N executions for N rows) |
| Can run independently? | Yes (paste it, run it) | No (depends on outer row) |
| Use when | Comparison against a fixed value | Per-row comparison (each row vs its group) |
-- NON-CORRELATED: "Above company average" (subquery runs ONCE)
SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
-- CORRELATED: "Above department average" (subquery runs PER ROW)
SELECT * FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.department = e.department);
Real-life analogy: Non-correlated subquery is like checking the weather forecast ONCE for your city before going out. Correlated subquery is like checking the weather forecast SEPARATELY for each stop on a road trip — one check per city, much more effort.
EXISTS — The Efficient Existence Check
EXISTS does not return data. It returns TRUE or FALSE — “does at least one matching row exist?”
-- Employees who manage at least one person
SELECT e.first_name, e.department
FROM employees e
WHERE EXISTS (
SELECT 1
FROM employees sub
WHERE sub.manager_id = e.emp_id
);
-- For each employee, check: "Does anyone report to this person?"
-- EXISTS stops at the FIRST match — does not count all subordinates
-- Result: Naveen, Vishnu, Anita (they have subordinates)
Why EXISTS Is Fast
EXISTS stops scanning the inner table the MOMENT it finds one match:
Checking Naveen (emp_id=1001):
→ Look for manager_id = 1001
→ Found Shrey (manager_id=1001) → STOP → TRUE
→ Does NOT check Vrushab (unnecessary — already found one match)
Checking Ravi (emp_id=1005):
→ Look for manager_id = 1005
→ Scan entire table → no match → FALSE
IN would build a complete list of ALL manager_ids first, then check each employee against the list. EXISTS skips that list-building step.
NOT EXISTS — The NULL-Safe Alternative to NOT IN
-- Employees who do NOT manage anyone
SELECT e.first_name
FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM employees sub WHERE sub.manager_id = e.emp_id
);
-- Result: Shrey, Vrushab, Ravi, Priya, Deepak, Kavya, Manoj
-- ❌ NOT IN version (DANGEROUS if NULLs exist):
SELECT first_name FROM employees
WHERE emp_id NOT IN (SELECT manager_id FROM employees);
-- Returns NOTHING because manager_id has NULLs!
-- ✅ NOT EXISTS version (NULL-safe):
SELECT e.first_name FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM employees sub WHERE sub.manager_id = e.emp_id
);
-- Returns correct results even with NULLs
Rule: Always use NOT EXISTS instead of NOT IN. It is NULL-safe and often faster.
Joins vs Subqueries — Same Result, Different Approach
The Same Question: Five Different Ways
Question: “Find employees in departments with a budget over 500K”
-- Assume a departments table exists:
-- dept_id, department_name, budget
-- Method 1: JOIN
SELECT e.first_name, e.department, d.budget
FROM employees e
JOIN departments d ON e.department = d.department_name
WHERE d.budget > 500000;
-- Method 2: IN with subquery
SELECT first_name, department
FROM employees
WHERE department IN (
SELECT department_name FROM departments WHERE budget > 500000
);
-- Method 3: EXISTS with correlated subquery
SELECT e.first_name, e.department
FROM employees e
WHERE EXISTS (
SELECT 1 FROM departments d
WHERE d.department_name = e.department AND d.budget > 500000
);
-- Method 4: Derived table (FROM subquery)
SELECT e.first_name, e.department, d.budget
FROM employees e
JOIN (SELECT department_name, budget FROM departments WHERE budget > 500000) d
ON e.department = d.department_name;
-- Method 5: CTE
WITH high_budget AS (
SELECT department_name, budget FROM departments WHERE budget > 500000
)
SELECT e.first_name, e.department, hb.budget
FROM employees e
JOIN high_budget hb ON e.department = hb.department_name;
All five return the same employees. But performance and readability differ.
Performance Comparison
| Method | When It Is Fastest | When It Is Slowest |
|---|---|---|
| JOIN | When you need columns from both tables. Always well-optimized. | Rare — joins are highly optimized |
| IN | Small subquery result (< 1000 values) | Large subquery result or NULLs in NOT IN |
| EXISTS | Large inner table + checking existence | Small inner table (overhead not worth it) |
| Correlated subquery | Small outer table | Large outer table (runs per row!) |
| Window Function | Per-group calculations | Not applicable for existence checks |
The Decision Tree
Do you need columns from BOTH tables?
YES → Use JOIN
NO ↓
Are you checking existence (yes/no question)?
YES → Use EXISTS (or NOT EXISTS for absence)
NO ↓
Is the subquery result small (< 1000 rows)?
YES → Use IN
NO → Use EXISTS
Is it a per-row calculation (each row vs its own group)?
YES → Use Window Function (best) or Correlated Subquery (slower)
Subqueries with INSERT, UPDATE, DELETE
INSERT with Subquery
-- Copy high earners into a separate table
INSERT INTO high_earners (emp_id, first_name, salary)
SELECT emp_id, first_name, salary
FROM employees
WHERE salary > 95000;
-- Insert summary data
INSERT INTO department_summary (department, avg_salary, headcount)
SELECT department, AVG(salary), COUNT(*)
FROM employees
GROUP BY department;
UPDATE with Subquery
-- Set bonus to 10% of salary for employees earning above department average
UPDATE employees
SET bonus = salary * 0.10
WHERE salary > (
SELECT AVG(e2.salary) FROM employees e2 WHERE e2.department = employees.department
);
-- Update based on another table
UPDATE employees
SET department = 'Data Engineering'
WHERE dept_id IN (SELECT dept_id FROM departments WHERE department_name = 'Engineering');
DELETE with Subquery
-- Delete employees in departments being shut down
DELETE FROM employees
WHERE department IN (
SELECT department_name FROM departments WHERE status = 'CLOSED'
);
-- Delete using NOT EXISTS (orphan cleanup)
DELETE FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM departments d WHERE d.department_name = e.department
);
Real-World Patterns
Pattern 1: “Above Average” (Most Common Interview Question)
-- Employees earning above their department's average
-- Best approach: Window Function (single pass)
SELECT first_name, department, salary, dept_avg
FROM (
SELECT first_name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees
) sub
WHERE salary > dept_avg;
Pattern 2: “Top N Per Group”
-- Top 2 earners per department
-- Best approach: Window Function
SELECT first_name, department, salary
FROM (
SELECT first_name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
) ranked
WHERE rn <= 2;
Pattern 3: “Exists in Another Table”
-- Customers who have placed an order (existence check)
-- Best approach: EXISTS
SELECT c.customer_name FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
Pattern 4: “Maximum Per Group” Without Window Functions
-- Highest paid employee in each department (old-school approach)
SELECT e.first_name, e.department, e.salary
FROM employees e
WHERE e.salary = (
SELECT MAX(e2.salary) FROM employees e2 WHERE e2.department = e.department
);
-- Correlated subquery — works but slower than Window Function on large tables
Pattern 5: “Running Comparison”
-- Each employee vs the next highest paid in their department
-- Best approach: Window Function with LEAD
SELECT first_name, department, salary,
LEAD(salary, 1) OVER (PARTITION BY department ORDER BY salary DESC) AS next_lower_salary,
salary - LEAD(salary, 1) OVER (PARTITION BY department ORDER BY salary DESC) AS gap
FROM employees;
Common Mistakes
-
Correlated subquery on large tables — runs per row, O(N*M) complexity. Replace with JOIN or Window Function.
-
NOT IN with possible NULLs — returns empty results. Always use NOT EXISTS.
-
Scalar subquery returning multiple rows —
WHERE salary > (SELECT salary FROM employees WHERE department = 'Sales')fails because the subquery returns 4 rows. Use MAX, MIN, AVG, or IN. -
Using subquery when JOIN is simpler — if you need columns from both tables, JOIN is cleaner and the optimizer handles it better.
-
Nested subqueries 3+ levels deep — unreadable and hard to debug. Refactor to CTEs.
-
Using correlated subquery when Window Function works — Window Functions compute in a single pass. Correlated subqueries recompute per row. Always prefer Window Functions for per-group calculations.
Interview Questions
Q: What is the difference between a correlated and non-correlated subquery? A: A non-correlated subquery is independent — it runs once and its result is reused for every outer row. A correlated subquery references the outer query and runs once per outer row. Non-correlated is faster (single execution). Correlated is flexible but slow on large tables (N executions for N rows).
Q: When should you use EXISTS instead of IN? A: Use EXISTS when the inner table is large (EXISTS stops at the first match, IN builds the full list). Always use NOT EXISTS instead of NOT IN to avoid the NULL trap. Use IN only for small, fixed lists.
Q: When should you use a JOIN instead of a subquery? A: When you need columns from both tables in the result. When both tables are large (JOINs are heavily optimized by query planners). When readability matters — JOINs make the relationship explicit.
Q: What is a scalar subquery?
A: A subquery in the SELECT clause that returns exactly one value. Used to add a calculated column: SELECT name, (SELECT AVG(salary) FROM employees) AS avg. If it returns multiple rows, SQL throws an error.
Q: When should you use a Window Function instead of a subquery? A: For per-group calculations (above average, top N per group, running totals, row comparison). Window Functions compute in a single pass over the data. Correlated subqueries compute per row. Window Functions are almost always faster and more readable.
Wrapping Up
Subqueries are the bridge between simple and complex SQL. They let you ask multi-step questions, check existence, filter on aggregated values, and compare each row against its group. But they come with performance trade-offs.
The hierarchy of preference: 1. JOIN — when you need data from both tables 2. Window Function — when you need per-group calculations 3. EXISTS — when you need existence checks on large tables 4. Non-correlated subquery (IN) — when the inner result is small 5. Correlated subquery — last resort, when nothing else works
Master these five tools and you can answer any SQL question thrown at you.
Previous post: GROUP BY, Aggregations, HAVING, CASE WHEN
Related posts: – SQL Joins – SQL Window Functions – CTEs and Subqueries – PySpark Joins (same concepts in Spark)
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.