SQL Joins Explained: INNER, LEFT, RIGHT, FULL, CROSS, and SELF Joins with Real Examples

If you can write a SELECT with a WHERE clause, you know basic SQL. But the moment you need data from two or more tables, you need joins. And joins are where most people get confused.

Which join do I use? What happens to rows that don’t match? Why am I getting duplicate rows? Why does my LEFT JOIN return the same results as my INNER JOIN?

This post answers all of those questions. I will cover every type of SQL join with visual diagrams, real examples using employee and department data, and the exact scenarios where each join is the right choice.

What Is a Join?
The Sample Data
INNER JOIN: Only Matching Rows
LEFT JOIN (LEFT OUTER JOIN): All Left, Matching Right
RIGHT JOIN (RIGHT OUTER JOIN): All Right, Matching Left
FULL OUTER JOIN: Everything from Both
CROSS JOIN: Every Combination
SELF JOIN: A Table Joined to Itself
All Joins Side-by-Side Comparison
JOIN with Multiple Conditions
JOIN with Three or More Tables
Common Table Expressions (CTEs) with Joins
The Difference Between WHERE and ON in Joins
Common Mistakes
Performance Tips
Real-World Use Cases for Data Engineers
Interview Questions
Wrapping Up

What Is a Join?

A join combines rows from two or more tables based on a related column between them. The related column is typically a primary key in one table and a foreign key in another.

SELECT employees.name, departments.department_name
FROM employees
JOIN departments ON employees.department_id = departments.id;

The ON clause specifies the relationship: match rows where employees.department_id equals departments.id.

The Sample Data

I will use these two tables throughout the post:

employees table:

id	name	department_id	salary
1	Alice	10	70000
2	Bob	20	60000
3	Carol	10	85000
4	Dave	30	75000
5	Eve	NULL	55000

Eve has department_id = NULL — she has not been assigned to a department yet.

departments table:

id	department_name	location
10	Engineering	Toronto
20	Sales	Mumbai
30	Marketing	London
40	HR	Toronto

HR (id=40) exists but no employees are assigned to it.

CREATE TABLE departments (
    id INT PRIMARY KEY,
    department_name VARCHAR(50),
    location VARCHAR(50)
);

INSERT INTO departments VALUES (10, 'Engineering', 'Toronto');
INSERT INTO departments VALUES (20, 'Sales', 'Mumbai');
INSERT INTO departments VALUES (30, 'Marketing', 'London');
INSERT INTO departments VALUES (40, 'HR', 'Toronto');

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    department_id INT,
    salary INT
);

INSERT INTO employees VALUES (1, 'Alice', 10, 70000);
INSERT INTO employees VALUES (2, 'Bob', 20, 60000);
INSERT INTO employees VALUES (3, 'Carol', 10, 85000);
INSERT INTO employees VALUES (4, 'Dave', 30, 75000);
INSERT INTO employees VALUES (5, 'Eve', NULL, 55000);

INNER JOIN: Only Matching Rows

What It Does

Returns only rows where the join condition is satisfied in both tables. If a row in the left table has no match in the right table, it is excluded. If a row in the right table has no match in the left table, it is also excluded.

Syntax

SELECT e.name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;

Result

name	department_name
Alice	Engineering
Bob	Sales
Carol	Engineering
Dave	Marketing

Who is missing? – Eve is missing because her department_id is NULL (no match in departments) – HR department is missing because no employees have department_id = 40

When to Use INNER JOIN

You only want records that have valid relationships in both tables
You are building reports where incomplete data should be excluded
This is the default and most commonly used join

Visual Representation

employees          departments
+---------+        +---------+
| Alice   |---+--->| Eng     |
| Bob     |---+--->| Sales   |
| Carol   |---+--->| Eng     |
| Dave    |---+--->| Mktg    |
| Eve     |   X   | HR      |  <-- no match for Eve or HR
+---------+        +---------+

Result: Only the connected rows (4 rows)

LEFT JOIN (LEFT OUTER JOIN): All Left, Matching Right

What It Does

Returns all rows from the left table and the matching rows from the right table. If a left table row has no match in the right table, the right side columns are filled with NULL.

Syntax

SELECT e.name, e.department_id, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;

Result

name	department_id	department_name
Alice	10	Engineering
Bob	20	Sales
Carol	10	Engineering
Dave	30	Marketing
Eve	NULL	NULL

Eve is included with NULL for department_name because LEFT JOIN keeps all rows from the left table.

HR is still missing because it is in the right table, and LEFT JOIN does not guarantee right table rows.

When to Use LEFT JOIN

You want all records from the primary table regardless of whether they have a match
Finding unmatched records: “Which employees are not assigned to a department?”
Preserving the complete list while enriching with optional related data

Finding Unmatched Records

-- Employees without a department
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.id IS NULL;

-- Result: Eve

This is one of the most common SQL patterns: LEFT JOIN + WHERE right_table.column IS NULL to find orphan records.

RIGHT JOIN (RIGHT OUTER JOIN): All Right, Matching Left

What It Does

The mirror of LEFT JOIN. Returns all rows from the right table and matching rows from the left table. Unmatched right rows have NULL for left columns.

Syntax

SELECT e.name, d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.id;

Result

name	department_name
Alice	Engineering
Carol	Engineering
Bob	Sales
Dave	Marketing
NULL	HR

HR is included with NULL for name because RIGHT JOIN keeps all rows from the right table.

Eve is missing because she is in the left table and RIGHT JOIN does not guarantee left table rows.

When to Use RIGHT JOIN

In practice, RIGHT JOIN is rarely used. You can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order:

-- These two queries return the same result:
SELECT e.name, d.department_name
FROM employees e RIGHT JOIN departments d ON e.department_id = d.id;

SELECT e.name, d.department_name
FROM departments d LEFT JOIN employees e ON e.department_id = d.id;

Most developers always use LEFT JOIN for consistency and readability.

FULL OUTER JOIN: Everything from Both

What It Does

Returns all rows from both tables. Matching rows are combined. Unmatched rows from either side have NULL for the missing columns.

Syntax

SELECT e.name, d.department_name
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;

Result

name	department_name
Alice	Engineering
Bob	Sales
Carol	Engineering
Dave	Marketing
Eve	NULL
NULL	HR

Both Eve and HR are included. Every row from both tables appears, with NULL where there is no match.

When to Use FULL OUTER JOIN

Data reconciliation: comparing two datasets to find mismatches
Finding records that exist in one system but not the other
Merging two data sources where either side might have gaps

Finding Mismatches

-- Records that exist in only one table (not both)
SELECT e.name, d.department_name
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id
WHERE e.id IS NULL OR d.id IS NULL;

-- Result:
-- Eve   | NULL    (employee without department)
-- NULL  | HR      (department without employees)

Note: MySQL does not support FULL OUTER JOIN. Use UNION of LEFT JOIN and RIGHT JOIN instead:

SELECT e.name, d.department_name
FROM employees e LEFT JOIN departments d ON e.department_id = d.id
UNION
SELECT e.name, d.department_name
FROM employees e RIGHT JOIN departments d ON e.department_id = d.id;

CROSS JOIN: Every Combination

What It Does

Returns the Cartesian product — every row from the left table combined with every row from the right table. No ON clause needed.

Syntax

SELECT e.name, d.department_name
FROM employees e
CROSS JOIN departments d;

Result

5 employees x 4 departments = 20 rows

name	department_name
Alice	Engineering
Alice	Sales
Alice	Marketing
Alice	HR
Bob	Engineering
Bob	Sales
…	…

When to Use CROSS JOIN

Generating all possible combinations (e.g., all product-color pairs)
Creating a date dimension table (cross join years x months x days)
Test data generation
Pairing every employee with every training course

Warning: CROSS JOIN on large tables produces enormous results. 10,000 rows x 10,000 rows = 100 million rows.

SELF JOIN: A Table Joined to Itself

What It Does

Joins a table to itself using different aliases. Useful when rows in the same table have a relationship (like employee-manager).

Example: Employee-Manager Hierarchy

-- Add a manager_id column to employees
ALTER TABLE employees ADD manager_id INT;
UPDATE employees SET manager_id = 3 WHERE id IN (1, 2);  -- Carol manages Alice, Bob
UPDATE employees SET manager_id = NULL WHERE id = 3;       -- Carol has no manager

SELECT e.name as employee,
       m.name as manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Result

employee	manager
Alice	Carol
Bob	Carol
Carol	NULL
Dave	NULL
Eve	NULL

The same employees table is used twice — once as e (employee) and once as m (manager).

When to Use SELF JOIN

Organizational hierarchies (employee-manager)
Comparing rows within the same table
Finding duplicates based on specific columns

Finding Duplicates

-- Find employees with the same salary
SELECT a.name, b.name, a.salary
FROM employees a
JOIN employees b ON a.salary = b.salary AND a.id < b.id;

The a.id < b.id prevents matching a row with itself and prevents duplicate pairs (Alice-Bob and Bob-Alice).

All Joins Side-by-Side Comparison

Join Type	Left Table Rows	Right Table Rows	Unmatched Left	Unmatched Right
INNER	Only matched	Only matched	Excluded	Excluded
LEFT	All	Only matched	Included (NULLs)	Excluded
RIGHT	Only matched	All	Excluded	Included (NULLs)
FULL	All	All	Included (NULLs)	Included (NULLs)
CROSS	All	All	N/A (every combo)	N/A (every combo)
SELF	Depends on join type used	Same table	Depends	Depends

JOIN with Multiple Conditions

You can join on multiple columns:

SELECT *
FROM orders o
JOIN order_details od
    ON o.order_id = od.order_id
    AND o.region = od.region;

Or add extra conditions in the ON clause:

-- Only join active departments
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d
    ON e.department_id = d.id
    AND d.location = 'Toronto';

Important distinction: For LEFT JOIN, conditions in the ON clause filter the right table BEFORE joining (non-matching left rows still appear with NULL). Conditions in the WHERE clause filter AFTER joining (can eliminate left rows entirely).

JOIN with Three or More Tables

SELECT e.name, d.department_name, p.project_name
FROM employees e
JOIN departments d ON e.department_id = d.id
JOIN projects p ON e.id = p.employee_id;

Order matters for readability but not for results (the optimizer reorders anyway). Start with the “main” table and join outward.

For optional relationships, use LEFT JOIN:

SELECT e.name, d.department_name, p.project_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
LEFT JOIN projects p ON e.id = p.employee_id;

This returns all employees, even those without a department or project.

Common Table Expressions (CTEs) with Joins

CTEs make complex joins readable:

WITH active_employees AS (
    SELECT id, name, department_id, salary
    FROM employees
    WHERE salary > 50000
),
large_departments AS (
    SELECT d.id, d.department_name, COUNT(*) as emp_count
    FROM departments d
    JOIN employees e ON d.id = e.department_id
    GROUP BY d.id, d.department_name
    HAVING COUNT(*) >= 2
)
SELECT ae.name, ld.department_name, ae.salary
FROM active_employees ae
JOIN large_departments ld ON ae.department_id = ld.id;

The Difference Between WHERE and ON in Joins

This is subtle but important, especially for LEFT JOIN:

ON clause (filters BEFORE join)

SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d
    ON e.department_id = d.id
    AND d.location = 'Toronto';

-- Result: ALL employees shown
-- Alice | Engineering (Toronto - matches)
-- Bob   | NULL        (Sales is in Mumbai - no match, but Bob still appears)
-- Carol | Engineering (Toronto - matches)
-- Dave  | NULL        (Marketing is in London - no match, but Dave still appears)
-- Eve   | NULL        (no department)

WHERE clause (filters AFTER join)

SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.location = 'Toronto';

-- Result: Only Toronto employees
-- Alice | Engineering
-- Carol | Engineering
-- (Bob, Dave, Eve are filtered out entirely)

The rule: For LEFT JOIN, put optional filters in ON (preserves all left rows). Put mandatory filters in WHERE (eliminates non-matching rows).

For INNER JOIN, ON and WHERE produce the same result.

Common Mistakes

1. Forgetting the ON Clause

-- WRONG: This becomes a CROSS JOIN (millions of rows)
SELECT * FROM employees, departments;

-- RIGHT: Always specify the join condition
SELECT * FROM employees e JOIN departments d ON e.department_id = d.id;

2. Getting Duplicates from One-to-Many Joins

If one department has 3 employees and you join with another table that has 2 rows per department, you get 6 rows (3 x 2). This is correct behavior but often unexpected.

Fix: Use DISTINCT, GROUP BY, or window functions to deduplicate.

3. LEFT JOIN Behaving Like INNER JOIN

-- This LEFT JOIN is secretly an INNER JOIN because of the WHERE clause
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.department_name = 'Engineering';
-- Eve (NULL department) is filtered out by WHERE

Fix: Move the condition to the ON clause if you want to preserve all left rows.

4. Joining on Wrong Columns

Always verify which column is the primary key and which is the foreign key. Joining on the wrong pair gives incorrect results without any error.

Performance Tips

Index your join columns. The columns in the ON clause should have indexes. Without indexes, the database scans entire tables for every join.
Filter early. Add WHERE conditions that reduce row counts before the join:

-- GOOD: Filter first, then join
SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary > 70000;

-- The optimizer may push the WHERE filter before the join

Avoid SELECT star. Only select the columns you need. Less data = faster queries.
Be careful with CROSS JOIN. Always check the row counts before running. 10K x 10K = 100M rows.
Use EXPLAIN/EXPLAIN ANALYZE to see the query plan and identify slow joins.

Real-World Use Cases for Data Engineers

1. Data Reconciliation Between Source and Target

-- Find records in source that are missing from target
SELECT s.id, s.name
FROM source_table s
LEFT JOIN target_table t ON s.id = t.id
WHERE t.id IS NULL;

2. Building a Star Schema Fact Table

-- Join fact table with all dimension tables
SELECT f.order_id, f.amount,
       dc.customer_name,
       dp.product_name,
       dd.date_full
FROM fact_orders f
JOIN dim_customer dc ON f.customer_key = dc.customer_key
JOIN dim_product dp ON f.product_key = dp.product_key
JOIN dim_date dd ON f.date_key = dd.date_key;

3. SCD Type 2 Lookup (Slowly Changing Dimension)

-- Join to the correct version of a dimension based on date range
SELECT f.order_id, f.amount, dc.customer_name, dc.address
FROM fact_orders f
JOIN dim_customer dc
    ON f.customer_key = dc.customer_key
    AND f.order_date >= dc.effective_from
    AND f.order_date < dc.effective_to;

4. Finding Data Quality Issues

-- Employees referencing non-existent departments
SELECT e.name, e.department_id
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE e.department_id IS NOT NULL
  AND d.id IS NULL;

5. Metadata-Driven Pipeline Validation

-- Compare metadata config with actual tables
SELECT m.TableName, m.SchemaName,
       CASE WHEN t.TABLE_NAME IS NOT NULL THEN 'EXISTS' ELSE 'MISSING' END as status
FROM metadata m
LEFT JOIN INFORMATION_SCHEMA.TABLES t
    ON m.TableName = t.TABLE_NAME
    AND m.SchemaName = t.TABLE_SCHEMA;

Interview Questions

Q: What is the difference between INNER JOIN and LEFT JOIN? A: INNER JOIN returns only rows with matches in both tables. LEFT JOIN returns all rows from the left table and matching rows from the right. Unmatched left rows have NULL for right table columns.

Q: How do you find records in Table A that do not exist in Table B? A: LEFT JOIN Table B ON the key, then WHERE B.key IS NULL. This returns rows from A with no match in B.

Q: What is a CROSS JOIN and when would you use it? A: CROSS JOIN produces the Cartesian product — every combination of rows from both tables. Use it for generating all possible pairs, creating date dimensions, or test data. Avoid on large tables.

Q: Can you join more than two tables? A: Yes. Chain JOIN clauses. Each join adds one more table with its own ON condition. The optimizer handles the execution order.

Q: What is the difference between putting a condition in ON vs WHERE for a LEFT JOIN? A: ON filters the right table before joining (left rows still appear with NULL). WHERE filters after joining (can eliminate left rows entirely). For INNER JOIN they produce the same result.

Q: What is a self join? A: A table joined to itself using different aliases. Common use cases include employee-manager hierarchies, finding duplicates, and comparing rows within the same table.

Q: Why is my LEFT JOIN returning the same results as INNER JOIN? A: Likely because your WHERE clause filters out the NULL rows that LEFT JOIN preserves. Move the condition from WHERE to the ON clause.

Q: How do you optimize a slow join? A: Index the join columns, filter rows before joining (reduce input size), select only needed columns, use EXPLAIN to analyze the query plan, and avoid unnecessary CROSS JOINs.

Wrapping Up

Joins are the backbone of relational databases. Master these six types and you can combine any tables, find mismatches, build reporting models, and solve 80% of SQL interview questions.

The most important takeaway: INNER JOIN excludes unmatched rows. LEFT JOIN preserves them. Everything else is a variation on this concept.

Related posts: – SQL Window Functions – Parquet vs CSV vs JSON – Top 15 ADF Interview Questions

If this guide helped you understand joins, share it with someone preparing for SQL interviews. Questions? Drop a comment below.

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

SQL Joins Explained: INNER, LEFT, RIGHT, FULL, CROSS, and SELF Joins with Real Examples

Table of Contents

What Is a Join?

The Sample Data

INNER JOIN: Only Matching Rows

What It Does

Syntax

Result

When to Use INNER JOIN

Visual Representation

LEFT JOIN (LEFT OUTER JOIN): All Left, Matching Right

What It Does

Syntax

Result

When to Use LEFT JOIN

Finding Unmatched Records

RIGHT JOIN (RIGHT OUTER JOIN): All Right, Matching Left

What It Does

Syntax

Result

When to Use RIGHT JOIN

FULL OUTER JOIN: Everything from Both

What It Does

Syntax

Result

When to Use FULL OUTER JOIN

Finding Mismatches

CROSS JOIN: Every Combination

What It Does

Syntax

Result

When to Use CROSS JOIN

SELF JOIN: A Table Joined to Itself

What It Does

Example: Employee-Manager Hierarchy

Result

When to Use SELF JOIN

Finding Duplicates

All Joins Side-by-Side Comparison

JOIN with Multiple Conditions

JOIN with Three or More Tables

Common Table Expressions (CTEs) with Joins

The Difference Between WHERE and ON in Joins

ON clause (filters BEFORE join)

WHERE clause (filters AFTER join)

Common Mistakes

1. Forgetting the ON Clause

2. Getting Duplicates from One-to-Many Joins

3. LEFT JOIN Behaving Like INNER JOIN

4. Joining on Wrong Columns

Performance Tips

Real-World Use Cases for Data Engineers

1. Data Reconciliation Between Source and Target

2. Building a Star Schema Fact Table

3. SCD Type 2 Lookup (Slowly Changing Dimension)

4. Finding Data Quality Issues

5. Metadata-Driven Pipeline Validation

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply