SQL Joins Explained: INNER, LEFT, RIGHT, FULL, CROSS, and SELF Joins with Real Examples
If you can write a SELECT with a WHERE clause, you know basic SQL. But the moment you need data from two or more tables, you need joins. And joins are where most people get confused.
Which join do I use? What happens to rows that don’t match? Why am I getting duplicate rows? Why does my LEFT JOIN return the same results as my INNER JOIN?
This post answers all of those questions. I will cover every type of SQL join with visual diagrams, real examples using employee and department data, and the exact scenarios where each join is the right choice.
Table of Contents
- What Is a Join?
- The Sample Data
- INNER JOIN: Only Matching Rows
- LEFT JOIN (LEFT OUTER JOIN): All Left, Matching Right
- RIGHT JOIN (RIGHT OUTER JOIN): All Right, Matching Left
- FULL OUTER JOIN: Everything from Both
- CROSS JOIN: Every Combination
- SELF JOIN: A Table Joined to Itself
- All Joins Side-by-Side Comparison
- JOIN with Multiple Conditions
- JOIN with Three or More Tables
- Common Table Expressions (CTEs) with Joins
- The Difference Between WHERE and ON in Joins
- Common Mistakes
- Performance Tips
- Real-World Use Cases for Data Engineers
- Interview Questions
- Wrapping Up
What Is a Join?
A join combines rows from two or more tables based on a related column between them. The related column is typically a primary key in one table and a foreign key in another.
SELECT employees.name, departments.department_name
FROM employees
JOIN departments ON employees.department_id = departments.id;
The ON clause specifies the relationship: match rows where employees.department_id equals departments.id.
The Sample Data
I will use these two tables throughout the post:
employees table:
| id | name | department_id | salary |
|---|---|---|---|
| 1 | Alice | 10 | 70000 |
| 2 | Bob | 20 | 60000 |
| 3 | Carol | 10 | 85000 |
| 4 | Dave | 30 | 75000 |
| 5 | Eve | NULL | 55000 |
Eve has department_id = NULL — she has not been assigned to a department yet.
departments table:
| id | department_name | location |
|---|---|---|
| 10 | Engineering | Toronto |
| 20 | Sales | Mumbai |
| 30 | Marketing | London |
| 40 | HR | Toronto |
HR (id=40) exists but no employees are assigned to it.
CREATE TABLE departments (
id INT PRIMARY KEY,
department_name VARCHAR(50),
location VARCHAR(50)
);
INSERT INTO departments VALUES (10, 'Engineering', 'Toronto');
INSERT INTO departments VALUES (20, 'Sales', 'Mumbai');
INSERT INTO departments VALUES (30, 'Marketing', 'London');
INSERT INTO departments VALUES (40, 'HR', 'Toronto');
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
department_id INT,
salary INT
);
INSERT INTO employees VALUES (1, 'Alice', 10, 70000);
INSERT INTO employees VALUES (2, 'Bob', 20, 60000);
INSERT INTO employees VALUES (3, 'Carol', 10, 85000);
INSERT INTO employees VALUES (4, 'Dave', 30, 75000);
INSERT INTO employees VALUES (5, 'Eve', NULL, 55000);
INNER JOIN: Only Matching Rows
What It Does
Returns only rows where the join condition is satisfied in both tables. If a row in the left table has no match in the right table, it is excluded. If a row in the right table has no match in the left table, it is also excluded.
Syntax
SELECT e.name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
Result
| name | department_name |
|---|---|
| Alice | Engineering |
| Bob | Sales |
| Carol | Engineering |
| Dave | Marketing |
Who is missing?
– Eve is missing because her department_id is NULL (no match in departments)
– HR department is missing because no employees have department_id = 40
When to Use INNER JOIN
- You only want records that have valid relationships in both tables
- You are building reports where incomplete data should be excluded
- This is the default and most commonly used join
Visual Representation
employees departments
+---------+ +---------+
| Alice |---+--->| Eng |
| Bob |---+--->| Sales |
| Carol |---+--->| Eng |
| Dave |---+--->| Mktg |
| Eve | X | HR | <-- no match for Eve or HR
+---------+ +---------+
Result: Only the connected rows (4 rows)
LEFT JOIN (LEFT OUTER JOIN): All Left, Matching Right
What It Does
Returns all rows from the left table and the matching rows from the right table. If a left table row has no match in the right table, the right side columns are filled with NULL.
Syntax
SELECT e.name, e.department_id, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
Result
| name | department_id | department_name |
|---|---|---|
| Alice | 10 | Engineering |
| Bob | 20 | Sales |
| Carol | 10 | Engineering |
| Dave | 30 | Marketing |
| Eve | NULL | NULL |
Eve is included with NULL for department_name because LEFT JOIN keeps all rows from the left table.
HR is still missing because it is in the right table, and LEFT JOIN does not guarantee right table rows.
When to Use LEFT JOIN
- You want all records from the primary table regardless of whether they have a match
- Finding unmatched records: “Which employees are not assigned to a department?”
- Preserving the complete list while enriching with optional related data
Finding Unmatched Records
-- Employees without a department
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.id IS NULL;
-- Result: Eve
This is one of the most common SQL patterns: LEFT JOIN + WHERE right_table.column IS NULL to find orphan records.
RIGHT JOIN (RIGHT OUTER JOIN): All Right, Matching Left
What It Does
The mirror of LEFT JOIN. Returns all rows from the right table and matching rows from the left table. Unmatched right rows have NULL for left columns.
Syntax
SELECT e.name, d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.id;
Result
| name | department_name |
|---|---|
| Alice | Engineering |
| Carol | Engineering |
| Bob | Sales |
| Dave | Marketing |
| NULL | HR |
HR is included with NULL for name because RIGHT JOIN keeps all rows from the right table.
Eve is missing because she is in the left table and RIGHT JOIN does not guarantee left table rows.
When to Use RIGHT JOIN
In practice, RIGHT JOIN is rarely used. You can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order:
-- These two queries return the same result:
SELECT e.name, d.department_name
FROM employees e RIGHT JOIN departments d ON e.department_id = d.id;
SELECT e.name, d.department_name
FROM departments d LEFT JOIN employees e ON e.department_id = d.id;
Most developers always use LEFT JOIN for consistency and readability.
FULL OUTER JOIN: Everything from Both
What It Does
Returns all rows from both tables. Matching rows are combined. Unmatched rows from either side have NULL for the missing columns.
Syntax
SELECT e.name, d.department_name
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;
Result
| name | department_name |
|---|---|
| Alice | Engineering |
| Bob | Sales |
| Carol | Engineering |
| Dave | Marketing |
| Eve | NULL |
| NULL | HR |
Both Eve and HR are included. Every row from both tables appears, with NULL where there is no match.
When to Use FULL OUTER JOIN
- Data reconciliation: comparing two datasets to find mismatches
- Finding records that exist in one system but not the other
- Merging two data sources where either side might have gaps
Finding Mismatches
-- Records that exist in only one table (not both)
SELECT e.name, d.department_name
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id
WHERE e.id IS NULL OR d.id IS NULL;
-- Result:
-- Eve | NULL (employee without department)
-- NULL | HR (department without employees)
Note: MySQL does not support FULL OUTER JOIN. Use UNION of LEFT JOIN and RIGHT JOIN instead:
SELECT e.name, d.department_name
FROM employees e LEFT JOIN departments d ON e.department_id = d.id
UNION
SELECT e.name, d.department_name
FROM employees e RIGHT JOIN departments d ON e.department_id = d.id;
CROSS JOIN: Every Combination
What It Does
Returns the Cartesian product — every row from the left table combined with every row from the right table. No ON clause needed.
Syntax
SELECT e.name, d.department_name
FROM employees e
CROSS JOIN departments d;
Result
5 employees x 4 departments = 20 rows
| name | department_name |
|---|---|
| Alice | Engineering |
| Alice | Sales |
| Alice | Marketing |
| Alice | HR |
| Bob | Engineering |
| Bob | Sales |
| … | … |
When to Use CROSS JOIN
- Generating all possible combinations (e.g., all product-color pairs)
- Creating a date dimension table (cross join years x months x days)
- Test data generation
- Pairing every employee with every training course
Warning: CROSS JOIN on large tables produces enormous results. 10,000 rows x 10,000 rows = 100 million rows.
SELF JOIN: A Table Joined to Itself
What It Does
Joins a table to itself using different aliases. Useful when rows in the same table have a relationship (like employee-manager).
Example: Employee-Manager Hierarchy
-- Add a manager_id column to employees
ALTER TABLE employees ADD manager_id INT;
UPDATE employees SET manager_id = 3 WHERE id IN (1, 2); -- Carol manages Alice, Bob
UPDATE employees SET manager_id = NULL WHERE id = 3; -- Carol has no manager
SELECT e.name as employee,
m.name as manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
Result
| employee | manager |
|---|---|
| Alice | Carol |
| Bob | Carol |
| Carol | NULL |
| Dave | NULL |
| Eve | NULL |
The same employees table is used twice — once as e (employee) and once as m (manager).
When to Use SELF JOIN
- Organizational hierarchies (employee-manager)
- Comparing rows within the same table
- Finding duplicates based on specific columns
Finding Duplicates
-- Find employees with the same salary
SELECT a.name, b.name, a.salary
FROM employees a
JOIN employees b ON a.salary = b.salary AND a.id < b.id;
The a.id < b.id prevents matching a row with itself and prevents duplicate pairs (Alice-Bob and Bob-Alice).
All Joins Side-by-Side Comparison
| Join Type | Left Table Rows | Right Table Rows | Unmatched Left | Unmatched Right |
|---|---|---|---|---|
| INNER | Only matched | Only matched | Excluded | Excluded |
| LEFT | All | Only matched | Included (NULLs) | Excluded |
| RIGHT | Only matched | All | Excluded | Included (NULLs) |
| FULL | All | All | Included (NULLs) | Included (NULLs) |
| CROSS | All | All | N/A (every combo) | N/A (every combo) |
| SELF | Depends on join type used | Same table | Depends | Depends |
JOIN with Multiple Conditions
You can join on multiple columns:
SELECT *
FROM orders o
JOIN order_details od
ON o.order_id = od.order_id
AND o.region = od.region;
Or add extra conditions in the ON clause:
-- Only join active departments
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d
ON e.department_id = d.id
AND d.location = 'Toronto';
Important distinction: For LEFT JOIN, conditions in the ON clause filter the right table BEFORE joining (non-matching left rows still appear with NULL). Conditions in the WHERE clause filter AFTER joining (can eliminate left rows entirely).
JOIN with Three or More Tables
SELECT e.name, d.department_name, p.project_name
FROM employees e
JOIN departments d ON e.department_id = d.id
JOIN projects p ON e.id = p.employee_id;
Order matters for readability but not for results (the optimizer reorders anyway). Start with the “main” table and join outward.
For optional relationships, use LEFT JOIN:
SELECT e.name, d.department_name, p.project_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
LEFT JOIN projects p ON e.id = p.employee_id;
This returns all employees, even those without a department or project.
Common Table Expressions (CTEs) with Joins
CTEs make complex joins readable:
WITH active_employees AS (
SELECT id, name, department_id, salary
FROM employees
WHERE salary > 50000
),
large_departments AS (
SELECT d.id, d.department_name, COUNT(*) as emp_count
FROM departments d
JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.department_name
HAVING COUNT(*) >= 2
)
SELECT ae.name, ld.department_name, ae.salary
FROM active_employees ae
JOIN large_departments ld ON ae.department_id = ld.id;
The Difference Between WHERE and ON in Joins
This is subtle but important, especially for LEFT JOIN:
ON clause (filters BEFORE join)
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d
ON e.department_id = d.id
AND d.location = 'Toronto';
-- Result: ALL employees shown
-- Alice | Engineering (Toronto - matches)
-- Bob | NULL (Sales is in Mumbai - no match, but Bob still appears)
-- Carol | Engineering (Toronto - matches)
-- Dave | NULL (Marketing is in London - no match, but Dave still appears)
-- Eve | NULL (no department)
WHERE clause (filters AFTER join)
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.location = 'Toronto';
-- Result: Only Toronto employees
-- Alice | Engineering
-- Carol | Engineering
-- (Bob, Dave, Eve are filtered out entirely)
The rule: For LEFT JOIN, put optional filters in ON (preserves all left rows). Put mandatory filters in WHERE (eliminates non-matching rows).
For INNER JOIN, ON and WHERE produce the same result.
Common Mistakes
1. Forgetting the ON Clause
-- WRONG: This becomes a CROSS JOIN (millions of rows)
SELECT * FROM employees, departments;
-- RIGHT: Always specify the join condition
SELECT * FROM employees e JOIN departments d ON e.department_id = d.id;
2. Getting Duplicates from One-to-Many Joins
If one department has 3 employees and you join with another table that has 2 rows per department, you get 6 rows (3 x 2). This is correct behavior but often unexpected.
Fix: Use DISTINCT, GROUP BY, or window functions to deduplicate.
3. LEFT JOIN Behaving Like INNER JOIN
-- This LEFT JOIN is secretly an INNER JOIN because of the WHERE clause
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.department_name = 'Engineering';
-- Eve (NULL department) is filtered out by WHERE
Fix: Move the condition to the ON clause if you want to preserve all left rows.
4. Joining on Wrong Columns
Always verify which column is the primary key and which is the foreign key. Joining on the wrong pair gives incorrect results without any error.
Performance Tips
-
Index your join columns. The columns in the ON clause should have indexes. Without indexes, the database scans entire tables for every join.
-
Filter early. Add WHERE conditions that reduce row counts before the join:
-- GOOD: Filter first, then join
SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary > 70000;
-- The optimizer may push the WHERE filter before the join
-
Avoid SELECT star. Only select the columns you need. Less data = faster queries.
-
Be careful with CROSS JOIN. Always check the row counts before running. 10K x 10K = 100M rows.
-
Use EXPLAIN/EXPLAIN ANALYZE to see the query plan and identify slow joins.
Real-World Use Cases for Data Engineers
1. Data Reconciliation Between Source and Target
-- Find records in source that are missing from target
SELECT s.id, s.name
FROM source_table s
LEFT JOIN target_table t ON s.id = t.id
WHERE t.id IS NULL;
2. Building a Star Schema Fact Table
-- Join fact table with all dimension tables
SELECT f.order_id, f.amount,
dc.customer_name,
dp.product_name,
dd.date_full
FROM fact_orders f
JOIN dim_customer dc ON f.customer_key = dc.customer_key
JOIN dim_product dp ON f.product_key = dp.product_key
JOIN dim_date dd ON f.date_key = dd.date_key;
3. SCD Type 2 Lookup (Slowly Changing Dimension)
-- Join to the correct version of a dimension based on date range
SELECT f.order_id, f.amount, dc.customer_name, dc.address
FROM fact_orders f
JOIN dim_customer dc
ON f.customer_key = dc.customer_key
AND f.order_date >= dc.effective_from
AND f.order_date < dc.effective_to;
4. Finding Data Quality Issues
-- Employees referencing non-existent departments
SELECT e.name, e.department_id
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE e.department_id IS NOT NULL
AND d.id IS NULL;
5. Metadata-Driven Pipeline Validation
-- Compare metadata config with actual tables
SELECT m.TableName, m.SchemaName,
CASE WHEN t.TABLE_NAME IS NOT NULL THEN 'EXISTS' ELSE 'MISSING' END as status
FROM metadata m
LEFT JOIN INFORMATION_SCHEMA.TABLES t
ON m.TableName = t.TABLE_NAME
AND m.SchemaName = t.TABLE_SCHEMA;
Interview Questions
Q: What is the difference between INNER JOIN and LEFT JOIN? A: INNER JOIN returns only rows with matches in both tables. LEFT JOIN returns all rows from the left table and matching rows from the right. Unmatched left rows have NULL for right table columns.
Q: How do you find records in Table A that do not exist in Table B? A: LEFT JOIN Table B ON the key, then WHERE B.key IS NULL. This returns rows from A with no match in B.
Q: What is a CROSS JOIN and when would you use it? A: CROSS JOIN produces the Cartesian product — every combination of rows from both tables. Use it for generating all possible pairs, creating date dimensions, or test data. Avoid on large tables.
Q: Can you join more than two tables? A: Yes. Chain JOIN clauses. Each join adds one more table with its own ON condition. The optimizer handles the execution order.
Q: What is the difference between putting a condition in ON vs WHERE for a LEFT JOIN? A: ON filters the right table before joining (left rows still appear with NULL). WHERE filters after joining (can eliminate left rows entirely). For INNER JOIN they produce the same result.
Q: What is a self join? A: A table joined to itself using different aliases. Common use cases include employee-manager hierarchies, finding duplicates, and comparing rows within the same table.
Q: Why is my LEFT JOIN returning the same results as INNER JOIN? A: Likely because your WHERE clause filters out the NULL rows that LEFT JOIN preserves. Move the condition from WHERE to the ON clause.
Q: How do you optimize a slow join? A: Index the join columns, filter rows before joining (reduce input size), select only needed columns, use EXPLAIN to analyze the query plan, and avoid unnecessary CROSS JOINs.
Wrapping Up
Joins are the backbone of relational databases. Master these six types and you can combine any tables, find mismatches, build reporting models, and solve 80% of SQL interview questions.
The most important takeaway: INNER JOIN excludes unmatched rows. LEFT JOIN preserves them. Everything else is a variation on this concept.
Related posts: – SQL Window Functions – Parquet vs CSV vs JSON – Top 15 ADF Interview Questions
If this guide helped you understand joins, share it with someone preparing for SQL interviews. Questions? Drop a comment below.
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.