SQL Indexes and Execution Plans: How Databases Find Data, Why Queries Are Slow, and How to Fix Them

You write a query. It returns correct results. But it takes 45 seconds instead of 0.5 seconds. Your manager asks: “Why is this dashboard so slow?” The answer is almost always: missing or wrong indexes.

An index is a shortcut that helps the database find rows WITHOUT scanning the entire table. Without an index, finding one row in a 10-million row table means reading ALL 10 million rows. With the right index, it means reading 3-4 rows. That is the difference between 45 seconds and 0.5 seconds.

Think of an index like a book’s index at the back. To find “Delta Lake” in a 500-page book, you can either read all 500 pages (table scan) or check the index at the back: “Delta Lake — page 247” (index seek). Both find the answer. One takes 5 hours. The other takes 5 seconds.

What Is an Index?
How a Table Scan Works (Without Index)
How an Index Seek Works (With Index)
Clustered vs Non-Clustered Indexes
Creating Indexes
Composite Indexes (Multi-Column)
Index Column Order Matters
Covering Indexes (Include Columns)
Unique Indexes
Filtered Indexes
When Indexes Help and When They Hurt
Reading Execution Plans
The Key Operators in Execution Plans
Table Scan vs Index Scan vs Index Seek
Common Slow Query Patterns and Fixes
Index Maintenance (Fragmentation and Rebuild)
The Index Design Checklist
Common Mistakes
Interview Questions
Wrapping Up

What Is an Index?

An index is a sorted data structure (typically a B-tree) that maps column values to row locations. When you query a column that has an index, the database looks up the value in the sorted structure instead of scanning every row.

Without Index (Table Scan):
  "Find employee with emp_id = 5000"
  → Read row 1... no. Row 2... no. Row 3... no. ... Row 5000... YES!
  → Reads: 5,000 rows to find 1 result

With Index on emp_id (Index Seek):
  "Find employee with emp_id = 5000"
  → B-tree lookup: root → branch → leaf → row location
  → Reads: 3 nodes to find 1 result

How a Table Scan Works (Without Index)

Table: employees (1,000,000 rows, no index on city)

Query: SELECT * FROM employees WHERE city = 'Toronto'

Database action:
  Row 1: city = 'Mumbai' → skip
  Row 2: city = 'Toronto' → MATCH, add to results
  Row 3: city = 'Delhi' → skip
  Row 4: city = 'Toronto' → MATCH
  ... reads ALL 1,000,000 rows ...

  Result: 50,000 Toronto rows found
  Rows read: 1,000,000 (entire table)
  Time: 45 seconds

Real-life analogy: Finding everyone named “Naveen” in a phone book that is sorted by phone number (not name). You must read every single entry because the book is not sorted by name.

How an Index Seek Works (With Index)

Table: employees (1,000,000 rows, INDEX on city)

Query: SELECT * FROM employees WHERE city = 'Toronto'

Database action:
  B-tree index on city:
    Root: A-L → left, M-Z → right
    Branch: S-U → middle
    Leaf: "Toronto" → points to rows 2, 4, 17, 89, ... (50,000 pointers)

  → Jump directly to Toronto entries

  Result: 50,000 Toronto rows found
  Rows read: 50,000 (only matching rows)
  Time: 0.5 seconds

Clustered vs Non-Clustered Indexes

Clustered Index (The Table IS the Index)

A clustered index sorts the ACTUAL TABLE DATA by the index column. The table itself becomes a sorted structure.

-- By default, the PRIMARY KEY creates a clustered index
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,        -- Clustered index (data sorted by emp_id)
    first_name VARCHAR(50),
    salary DECIMAL(10,2)
);

Rules: – One per table (the data can only be sorted one way) – Usually the PRIMARY KEY – The actual data rows are stored in this sorted order – Queries on the clustered key are the fastest possible

Real-life analogy: A dictionary is a clustered index on words. The actual pages (data) are physically sorted alphabetically. You can quickly find “Python” because the book itself is sorted. But finding all words with 6 letters requires reading every page — the book is not sorted by word length.

Non-Clustered Index (A Separate Lookup Structure)

A non-clustered index is a separate structure that points BACK to the table rows.

-- Create a non-clustered index on city
CREATE NONCLUSTERED INDEX IX_employees_city
ON employees (city);

Non-Clustered Index on city:
  ┌─────────────────────────────────┐
  │ Calgary    → row pointer 5      │
  │ Delhi      → row pointer 8      │
  │ Mumbai     → row pointers 1,7   │
  │ Ottawa     → row pointer 4      │
  │ Toronto    → row pointers 2,3,6 │
  └─────────────────────────────────┘
          ↓ (lookup)
  Actual table (sorted by emp_id, the clustered index):
  Row 1: emp_id=1001, city=Mumbai
  Row 2: emp_id=1002, city=Toronto
  Row 3: emp_id=1003, city=Toronto
  ...

Rules: – Up to 999 per table (but more is not better) – A separate B-tree structure pointing to row locations – Queries must “look up” the actual row after finding the pointer

Real-life analogy: A book’s back-of-book index is a non-clustered index. It tells you “Delta Lake — page 247.” You then flip to page 247 to read the content. The index is separate from the content.

Creating Indexes

-- Non-clustered index (most common)
CREATE INDEX IX_employees_city ON employees (city);

-- Explicitly non-clustered
CREATE NONCLUSTERED INDEX IX_employees_department ON employees (department);

-- Clustered index (if no PK or you want it on a different column)
CREATE CLUSTERED INDEX IX_employees_hire_date ON employees (hire_date);
-- WARNING: This re-sorts the entire table. Only one clustered index allowed.

-- Drop an index
DROP INDEX IX_employees_city ON employees;

Composite Indexes (Multi-Column)

-- Index on department AND city (in that order)
CREATE INDEX IX_emp_dept_city ON employees (department, city);

-- This index helps these queries:
SELECT * FROM employees WHERE department = 'Engineering';                    -- ✅ Uses index
SELECT * FROM employees WHERE department = 'Engineering' AND city = 'Toronto'; -- ✅ Uses index
SELECT * FROM employees WHERE city = 'Toronto';                               -- ❌ Cannot use index!

Index Column Order Matters

The leftmost prefix rule: a composite index can only be used if the query filters on columns from LEFT to RIGHT.

Index: (department, city, salary)

Queries that USE this index:
  WHERE department = 'Eng'                              ✅ (leftmost column)
  WHERE department = 'Eng' AND city = 'Toronto'         ✅ (first two columns)
  WHERE department = 'Eng' AND city = 'Toronto' AND salary > 90000  ✅ (all three)

Queries that CANNOT use this index:
  WHERE city = 'Toronto'                                ❌ (skipped department)
  WHERE salary > 90000                                  ❌ (skipped department and city)
  WHERE city = 'Toronto' AND salary > 90000             ❌ (skipped department)

Real-life analogy: A phone book sorted by (Last Name, First Name, City). You can quickly find all “Smiths” (last name). You can find “John Smith” (last + first). But you CANNOT quickly find all “Johns” (first name only) — the book is not sorted that way.

Covering Indexes (Include Columns)

A covering index includes ALL columns the query needs, eliminating the need to look up the actual table row:

-- Query: frequently run
SELECT first_name, salary FROM employees WHERE department = 'Engineering';

-- Regular index: finds matching rows, then looks up first_name and salary from the table
CREATE INDEX IX_emp_dept ON employees (department);

-- Covering index: includes first_name and salary IN the index itself — no table lookup needed
CREATE INDEX IX_emp_dept_covering ON employees (department)
INCLUDE (first_name, salary);

INCLUDE columns are stored in the leaf level of the index but NOT sorted. They eliminate the table lookup (key lookup) step.

Unique Indexes

A unique index enforces that no two rows can have the same value in the indexed column(s) — it combines index performance with data integrity:

-- Unique index: no two employees can share an email
CREATE UNIQUE INDEX IX_emp_email ON employees (email);

-- Attempting a duplicate:
INSERT INTO employees (emp_id, first_name, email) VALUES (999, 'Test', 'naveen@email.com');
-- ERROR: Cannot insert duplicate key. The duplicate key value is (naveen@email.com).

-- Unique composite index: same person can't be in the same department twice
CREATE UNIQUE INDEX IX_emp_dept_person ON employees (dept_id, emp_id);

-- Unique indexes allow ONE NULL (in most databases)
-- Two NULLs in a unique column → depends on the database:
--   SQL Server: second NULL violates uniqueness ❌
--   PostgreSQL: multiple NULLs allowed ✅ (NULLs are not equal)

Key difference from UNIQUE constraint: Functionally identical — a UNIQUE constraint creates a unique index behind the scenes. But creating the index explicitly gives you control over index options (INCLUDE, filtered, fill factor). Use UNIQUE constraint for simple cases, explicit UNIQUE INDEX when you need covering or filtered options.

Real-life analogy: A unique index is like a license plate registry. Every plate must be unique — if you try to register “ABC 1234” and it already exists, the system rejects it. The registry also lets you look up any plate instantly (index seek).

Filtered Indexes

A filtered index covers only a SUBSET of rows — smaller, faster, and cheaper than a full index:

-- Index only active employees (ignore the 90% that are inactive/archived)
CREATE INDEX IX_emp_active_city ON employees (city)
WHERE is_active = 1;

-- Index only recent hires
CREATE INDEX IX_emp_recent ON employees (hire_date, department)
WHERE hire_date >= '2024-01-01';

-- Index only rows with non-NULL email (skip the NULLs)
CREATE UNIQUE INDEX IX_emp_email_nonnull ON employees (email)
WHERE email IS NOT NULL;
-- This allows multiple NULLs while enforcing uniqueness on actual values

-- Filtered index for a dashboard that only shows high earners
CREATE INDEX IX_emp_high_salary ON employees (department, salary)
WHERE salary >= 100000;

When to use filtered indexes:

Queries that always include the same filter: WHERE is_active = 1, WHERE status = 'Open'
Tables where most rows are irrelevant (archived data, soft-deleted records)
Enforcing uniqueness only on non-NULL values
Reducing index size and maintenance cost on large tables

The catch: Your query’s WHERE clause must match or be a subset of the filtered index’s WHERE clause, or the optimizer will ignore it.

Real-life analogy: A filtered index is like an express checkout lane at a grocery store — “10 items or fewer.” It only serves a subset of customers, but those customers get through much faster. The full lane (regular index) serves everyone but is slower.

When Indexes Help and When They Hurt

Indexes HELP (faster reads)	Indexes HURT (slower writes)
SELECT with WHERE on indexed column	Every INSERT must update all indexes on the table
JOIN on indexed columns	Every UPDATE on an indexed column rebuilds that index entry
ORDER BY on indexed column (avoids sort)	Every DELETE must remove entries from all indexes
GROUP BY on indexed column	More indexes = more storage space on disk
DISTINCT on indexed column	Index maintenance (rebuild/reorganize) takes time

The trade-off:
  0 indexes: Writes are fast. Reads are slow (full scans everywhere).
  5 indexes: Reads are fast. Writes are slightly slower (acceptable).
  20 indexes: Reads are fast. Writes are VERY slow (every insert updates 20 indexes).

Sweet spot: 5-7 indexes per table for most OLTP workloads.
Data warehouses (read-heavy, few writes): more indexes are acceptable.
Staging tables (write-heavy, few reads): minimal or no indexes.

Scenarios where indexes do NOT help:

Small tables (< 1,000 rows) — a full scan is already fast; the index overhead is not worth it
Low-cardinality columns — an index on gender (M/F) reads ~50% of the table either way
Queries returning most of the table — if you read 80% of rows, a scan is faster than 800,000 individual seeks
Heavily updated columns — a column that changes every second means the index is rebuilt every second
Write-heavy staging/temp tables — bulk inserts into indexed tables are significantly slower

Real-life analogy: Indexes are like bookmarks in a textbook. 5 bookmarks for frequently referenced chapters — very helpful. 200 bookmarks for every paragraph — the bookmarks themselves become clutter, falling out every time you open the book, and it takes time to maintain them. More is not always better.

Reading Execution Plans

An execution plan shows HOW the database executes your query:

-- Show the execution plan (SQL Server)
SET SHOWPLAN_ALL ON;
SELECT * FROM employees WHERE city = 'Toronto';
SET SHOWPLAN_ALL OFF;

-- Or in SSMS: Ctrl+L (estimated plan) or Ctrl+M (actual plan)
-- Or: SET STATISTICS IO ON; (shows read counts)

The Key Operators in Execution Plans

Operator	What It Means	Good or Bad?
Table Scan	Read EVERY row in the table	❌ Bad on large tables
Clustered Index Scan	Read every row via the clustered index	❌ Bad (still reads everything)
Index Scan	Read every entry in a non-clustered index	⚠️ Medium (smaller than table scan)
Index Seek	Jump directly to matching entries in the index	✅ Good (reads only matching rows)
Key Lookup	After index seek, look up remaining columns from the table	⚠️ Acceptable in small quantities
Nested Loops	For each outer row, scan inner rows	✅ Good for small datasets
Hash Match	Build hash table from one side, probe with other	✅ Good for large joins
Sort	Sort rows (expensive if not indexed)	⚠️ Expensive if unexpected

Table Scan vs Index Scan vs Index Seek

Table Scan:
  "Read every page of the book"
  When: No useful index exists
  Fix: Create an index on the filtered column

Index Scan:
  "Read every entry in the book's index"
  When: Index exists but query returns too many rows (>20% of table)
  Fix: Add more selective filters, or accept it for wide-range queries

Index Seek:
  "Look up one entry in the book's index"
  When: Index exists and query is selective (returns <20% of rows)
  Result: Fastest possible — this is the goal

Common Slow Query Patterns and Fixes

Pattern 1: Missing Index

-- SLOW: No index on city → Table Scan
SELECT * FROM employees WHERE city = 'Toronto';

-- FIX: Create index
CREATE INDEX IX_emp_city ON employees (city);
-- Now: Index Seek → fast

Pattern 2: Function on Indexed Column

-- SLOW: Function wrapping the column disables the index
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;
-- Database cannot use index on hire_date because YEAR() transforms it

-- FIX: Rewrite without function on the column
SELECT * FROM employees WHERE hire_date >= '2023-01-01' AND hire_date < '2024-01-01';
-- Now: Index Seek on hire_date

Pattern 3: Leading Wildcard in LIKE

-- SLOW: Leading wildcard forces full scan
SELECT * FROM employees WHERE last_name LIKE '%son';

-- FIX: Trailing wildcard uses index
SELECT * FROM employees WHERE last_name LIKE 'John%';

Pattern 4: Implicit Conversion

-- SLOW: emp_id is INT but compared to string
SELECT * FROM employees WHERE emp_id = '1001';
-- Database converts every emp_id to string for comparison → scan

-- FIX: Match types
SELECT * FROM employees WHERE emp_id = 1001;

Pattern 5: SELECT * When You Need 2 Columns

-- SLOW: Must look up ALL columns from the table after index seek
SELECT * FROM employees WHERE department = 'Engineering';

-- FASTER: Covering index can satisfy the query entirely
SELECT first_name, salary FROM employees WHERE department = 'Engineering';
-- With covering index: CREATE INDEX ... ON (department) INCLUDE (first_name, salary)

Index Maintenance (Fragmentation and Rebuild)

Over time, indexes become fragmented (like a hard drive) as data is inserted and deleted:

-- Check fragmentation
SELECT
    i.name AS index_name,
    s.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('employees'), NULL, NULL, 'LIMITED') s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id;

-- Reorganize (light maintenance, < 30% fragmentation)
ALTER INDEX IX_emp_city ON employees REORGANIZE;

-- Rebuild (heavy maintenance, > 30% fragmentation)
ALTER INDEX IX_emp_city ON employees REBUILD;

-- Rebuild all indexes on a table
ALTER INDEX ALL ON employees REBUILD;

Schedule: Rebuild indexes weekly or after large data loads.

The Index Design Checklist

✅ Index columns that appear in WHERE clauses
✅ Index columns that appear in JOIN conditions
✅ Index columns that appear in ORDER BY
✅ Put most selective column FIRST in composite indexes
✅ Use INCLUDE for columns needed in SELECT but not filtered
✅ One clustered index per table (usually the primary key)
✅ Keep total indexes under 5-7 per table
✅ Rebuild fragmented indexes weekly

Common Mistakes

Too many indexes — every index slows down INSERT, UPDATE, DELETE (the index must be updated too). 5-7 indexes per table is typical. 20+ is a problem.
Wrong column order in composite index — (city, department) does NOT help WHERE department = 'Eng'. Put the most filtered column first.
Functions on indexed columns — WHERE YEAR(hire_date) = 2023 cannot use an index on hire_date. Rewrite as a range.
Indexing low-cardinality columns — an index on gender (2 values) is useless. The database will scan half the table either way. Index high-cardinality columns (email, emp_id, city).
Never checking execution plans — you cannot optimize what you cannot see. Always check the plan for slow queries.
Not rebuilding fragmented indexes — fragmented indexes perform like no index at all. Rebuild weekly.

Interview Questions

Q: What is the difference between a clustered and non-clustered index? A: A clustered index sorts the actual table data — one per table, usually the primary key. A non-clustered index is a separate sorted structure with pointers back to the table — up to 999 per table. Clustered index seeks are fastest because data is already sorted. Non-clustered seeks require an additional lookup to fetch full row data.

Q: When should you create an index? A: On columns frequently used in WHERE clauses, JOIN conditions, and ORDER BY. The column should have high cardinality (many distinct values). Do not index columns rarely filtered or columns with very few distinct values. Balance read speed (indexes help) with write speed (indexes slow down inserts/updates).

Q: What is a covering index? A: An index that includes ALL columns needed by a query, so the database never needs to look up the actual table row. Created with INCLUDE clause. Eliminates the key lookup operator in execution plans, making queries significantly faster for specific query patterns.

Q: Why would a query not use an existing index? A: Common reasons: function applied to the indexed column (YEAR(date)), implicit type conversion, leading wildcard in LIKE, query returns too many rows (optimizer chooses scan), or the index columns are in the wrong order for the query’s WHERE clause.

Wrapping Up

Indexes are the single biggest performance lever in SQL. A missing index can make a query 100x slower. A well-designed index can make it 100x faster. Learn to read execution plans, identify table scans, and create targeted indexes — these skills matter more in production than any clever query trick.

← Previous: DDL, DML, and Constraints

SQL (9/15)

Next: Views, Temp Tables & Variables →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

← DDL, DML, and Constraints Views, Temp Tables & Variables →

SQL Indexes and Execution Plans: How Databases Find Data, Why Queries Are Slow, and How to Fix Them

SQL Indexes and Execution Plans: How Databases Find Data, Why Queries Are Slow, and How to Fix Them

Table of Contents

What Is an Index?

How a Table Scan Works (Without Index)

How an Index Seek Works (With Index)

Clustered vs Non-Clustered Indexes

Clustered Index (The Table IS the Index)

Non-Clustered Index (A Separate Lookup Structure)

Creating Indexes

Composite Indexes (Multi-Column)

Index Column Order Matters

Covering Indexes (Include Columns)

Unique Indexes

Filtered Indexes

When Indexes Help and When They Hurt

Reading Execution Plans

The Key Operators in Execution Plans

Table Scan vs Index Scan vs Index Seek

Common Slow Query Patterns and Fixes

Pattern 1: Missing Index

Pattern 2: Function on Indexed Column

Pattern 3: Leading Wildcard in LIKE

Pattern 4: Implicit Conversion

Pattern 5: SELECT * When You Need 2 Columns

Index Maintenance (Fragmentation and Rebuild)

The Index Design Checklist

Common Mistakes

Interview Questions

Wrapping Up

Leave a Comment Cancel Reply

SQL Indexes and Execution Plans: How Databases Find Data, Why Queries Are Slow, and How to Fix Them

Table of Contents

What Is an Index?

How a Table Scan Works (Without Index)

How an Index Seek Works (With Index)

Clustered vs Non-Clustered Indexes

Clustered Index (The Table IS the Index)

Non-Clustered Index (A Separate Lookup Structure)

Creating Indexes

Composite Indexes (Multi-Column)

Index Column Order Matters

Covering Indexes (Include Columns)

Unique Indexes

Filtered Indexes

When Indexes Help and When They Hurt

Reading Execution Plans

The Key Operators in Execution Plans

Table Scan vs Index Scan vs Index Seek

Common Slow Query Patterns and Fixes

Pattern 1: Missing Index

Pattern 2: Function on Indexed Column

Pattern 3: Leading Wildcard in LIKE

Pattern 4: Implicit Conversion

Pattern 5: SELECT * When You Need 2 Columns

Index Maintenance (Fragmentation and Rebuild)

The Index Design Checklist

Common Mistakes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply