SQL Indexes and Execution Plans: How Databases Find Data, Why Queries Are Slow, and How to Fix Them

SQL Indexes and Execution Plans: How Databases Find Data, Why Queries Are Slow, and How to Fix Them

You write a query. It returns correct results. But it takes 45 seconds instead of 0.5 seconds. Your manager asks: “Why is this dashboard so slow?” The answer is almost always: missing or wrong indexes.

An index is a shortcut that helps the database find rows WITHOUT scanning the entire table. Without an index, finding one row in a 10-million row table means reading ALL 10 million rows. With the right index, it means reading 3-4 rows. That is the difference between 45 seconds and 0.5 seconds.

Think of an index like a book’s index at the back. To find “Delta Lake” in a 500-page book, you can either read all 500 pages (table scan) or check the index at the back: “Delta Lake — page 247” (index seek). Both find the answer. One takes 5 hours. The other takes 5 seconds.

Table of Contents

  • What Is an Index?
  • How a Table Scan Works (Without Index)
  • How an Index Seek Works (With Index)
  • Clustered vs Non-Clustered Indexes
  • Creating Indexes
  • Composite Indexes (Multi-Column)
  • Index Column Order Matters
  • Covering Indexes (Include Columns)
  • Unique Indexes
  • Filtered Indexes
  • When Indexes Help and When They Hurt
  • Reading Execution Plans
  • The Key Operators in Execution Plans
  • Table Scan vs Index Scan vs Index Seek
  • Common Slow Query Patterns and Fixes
  • Index Maintenance (Fragmentation and Rebuild)
  • The Index Design Checklist
  • Common Mistakes
  • Interview Questions
  • Wrapping Up

What Is an Index?

An index is a sorted data structure (typically a B-tree) that maps column values to row locations. When you query a column that has an index, the database looks up the value in the sorted structure instead of scanning every row.

Without Index (Table Scan):
  "Find employee with emp_id = 5000"
  → Read row 1... no. Row 2... no. Row 3... no. ... Row 5000... YES!
  → Reads: 5,000 rows to find 1 result

With Index on emp_id (Index Seek):
  "Find employee with emp_id = 5000"
  → B-tree lookup: root → branch → leaf → row location
  → Reads: 3 nodes to find 1 result

How a Table Scan Works (Without Index)

Table: employees (1,000,000 rows, no index on city)

Query: SELECT * FROM employees WHERE city = 'Toronto'

Database action:
  Row 1: city = 'Mumbai' → skip
  Row 2: city = 'Toronto' → MATCH, add to results
  Row 3: city = 'Delhi' → skip
  Row 4: city = 'Toronto' → MATCH
  ... reads ALL 1,000,000 rows ...

  Result: 50,000 Toronto rows found
  Rows read: 1,000,000 (entire table)
  Time: 45 seconds

Real-life analogy: Finding everyone named “Naveen” in a phone book that is sorted by phone number (not name). You must read every single entry because the book is not sorted by name.

How an Index Seek Works (With Index)

Table: employees (1,000,000 rows, INDEX on city)

Query: SELECT * FROM employees WHERE city = 'Toronto'

Database action:
  B-tree index on city:
    Root: A-L → left, M-Z → right
    Branch: S-U → middle
    Leaf: "Toronto" → points to rows 2, 4, 17, 89, ... (50,000 pointers)

  → Jump directly to Toronto entries

  Result: 50,000 Toronto rows found
  Rows read: 50,000 (only matching rows)
  Time: 0.5 seconds

Clustered vs Non-Clustered Indexes

Clustered Index (The Table IS the Index)

A clustered index sorts the ACTUAL TABLE DATA by the index column. The table itself becomes a sorted structure.

-- By default, the PRIMARY KEY creates a clustered index
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,        -- Clustered index (data sorted by emp_id)
    first_name VARCHAR(50),
    salary DECIMAL(10,2)
);

Rules:One per table (the data can only be sorted one way) – Usually the PRIMARY KEY – The actual data rows are stored in this sorted order – Queries on the clustered key are the fastest possible

Real-life analogy: A dictionary is a clustered index on words. The actual pages (data) are physically sorted alphabetically. You can quickly find “Python” because the book itself is sorted. But finding all words with 6 letters requires reading every page — the book is not sorted by word length.

Non-Clustered Index (A Separate Lookup Structure)

A non-clustered index is a separate structure that points BACK to the table rows.

-- Create a non-clustered index on city
CREATE NONCLUSTERED INDEX IX_employees_city
ON employees (city);
Non-Clustered Index on city:
  ┌─────────────────────────────────┐
  │ Calgary    → row pointer 5      │
  │ Delhi      → row pointer 8      │
  │ Mumbai     → row pointers 1,7   │
  │ Ottawa     → row pointer 4      │
  │ Toronto    → row pointers 2,3,6 │
  └─────────────────────────────────┘
          ↓ (lookup)
  Actual table (sorted by emp_id, the clustered index):
  Row 1: emp_id=1001, city=Mumbai
  Row 2: emp_id=1002, city=Toronto
  Row 3: emp_id=1003, city=Toronto
  ...

Rules:Up to 999 per table (but more is not better) – A separate B-tree structure pointing to row locations – Queries must “look up” the actual row after finding the pointer

Real-life analogy: A book’s back-of-book index is a non-clustered index. It tells you “Delta Lake — page 247.” You then flip to page 247 to read the content. The index is separate from the content.

Creating Indexes

-- Non-clustered index (most common)
CREATE INDEX IX_employees_city ON employees (city);

-- Explicitly non-clustered
CREATE NONCLUSTERED INDEX IX_employees_department ON employees (department);

-- Clustered index (if no PK or you want it on a different column)
CREATE CLUSTERED INDEX IX_employees_hire_date ON employees (hire_date);
-- WARNING: This re-sorts the entire table. Only one clustered index allowed.

-- Drop an index
DROP INDEX IX_employees_city ON employees;

Composite Indexes (Multi-Column)

-- Index on department AND city (in that order)
CREATE INDEX IX_emp_dept_city ON employees (department, city);

-- This index helps these queries:
SELECT * FROM employees WHERE department = 'Engineering';                    -- ✅ Uses index
SELECT * FROM employees WHERE department = 'Engineering' AND city = 'Toronto'; -- ✅ Uses index
SELECT * FROM employees WHERE city = 'Toronto';                               -- ❌ Cannot use index!

Index Column Order Matters

The leftmost prefix rule: a composite index can only be used if the query filters on columns from LEFT to RIGHT.

Index: (department, city, salary)

Queries that USE this index:
  WHERE department = 'Eng'                              ✅ (leftmost column)
  WHERE department = 'Eng' AND city = 'Toronto'         ✅ (first two columns)
  WHERE department = 'Eng' AND city = 'Toronto' AND salary > 90000  ✅ (all three)

Queries that CANNOT use this index:
  WHERE city = 'Toronto'                                ❌ (skipped department)
  WHERE salary > 90000                                  ❌ (skipped department and city)
  WHERE city = 'Toronto' AND salary > 90000             ❌ (skipped department)

Real-life analogy: A phone book sorted by (Last Name, First Name, City). You can quickly find all “Smiths” (last name). You can find “John Smith” (last + first). But you CANNOT quickly find all “Johns” (first name only) — the book is not sorted that way.

Covering Indexes (Include Columns)

A covering index includes ALL columns the query needs, eliminating the need to look up the actual table row:

-- Query: frequently run
SELECT first_name, salary FROM employees WHERE department = 'Engineering';

-- Regular index: finds matching rows, then looks up first_name and salary from the table
CREATE INDEX IX_emp_dept ON employees (department);

-- Covering index: includes first_name and salary IN the index itself — no table lookup needed
CREATE INDEX IX_emp_dept_covering ON employees (department)
INCLUDE (first_name, salary);

INCLUDE columns are stored in the leaf level of the index but NOT sorted. They eliminate the table lookup (key lookup) step.

Reading Execution Plans

An execution plan shows HOW the database executes your query:

-- Show the execution plan (SQL Server)
SET SHOWPLAN_ALL ON;
SELECT * FROM employees WHERE city = 'Toronto';
SET SHOWPLAN_ALL OFF;

-- Or in SSMS: Ctrl+L (estimated plan) or Ctrl+M (actual plan)
-- Or: SET STATISTICS IO ON; (shows read counts)

The Key Operators in Execution Plans

Operator What It Means Good or Bad?
Table Scan Read EVERY row in the table ❌ Bad on large tables
Clustered Index Scan Read every row via the clustered index ❌ Bad (still reads everything)
Index Scan Read every entry in a non-clustered index ⚠️ Medium (smaller than table scan)
Index Seek Jump directly to matching entries in the index ✅ Good (reads only matching rows)
Key Lookup After index seek, look up remaining columns from the table ⚠️ Acceptable in small quantities
Nested Loops For each outer row, scan inner rows ✅ Good for small datasets
Hash Match Build hash table from one side, probe with other ✅ Good for large joins
Sort Sort rows (expensive if not indexed) ⚠️ Expensive if unexpected

Table Scan vs Index Scan vs Index Seek

Table Scan:
  "Read every page of the book"
  When: No useful index exists
  Fix: Create an index on the filtered column

Index Scan:
  "Read every entry in the book's index"
  When: Index exists but query returns too many rows (>20% of table)
  Fix: Add more selective filters, or accept it for wide-range queries

Index Seek:
  "Look up one entry in the book's index"
  When: Index exists and query is selective (returns <20% of rows)
  Result: Fastest possible — this is the goal

Common Slow Query Patterns and Fixes

Pattern 1: Missing Index

-- SLOW: No index on city → Table Scan
SELECT * FROM employees WHERE city = 'Toronto';

-- FIX: Create index
CREATE INDEX IX_emp_city ON employees (city);
-- Now: Index Seek → fast

Pattern 2: Function on Indexed Column

-- SLOW: Function wrapping the column disables the index
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;
-- Database cannot use index on hire_date because YEAR() transforms it

-- FIX: Rewrite without function on the column
SELECT * FROM employees WHERE hire_date >= '2023-01-01' AND hire_date < '2024-01-01';
-- Now: Index Seek on hire_date

Pattern 3: Leading Wildcard in LIKE

-- SLOW: Leading wildcard forces full scan
SELECT * FROM employees WHERE last_name LIKE '%son';

-- FIX: Trailing wildcard uses index
SELECT * FROM employees WHERE last_name LIKE 'John%';

Pattern 4: Implicit Conversion

-- SLOW: emp_id is INT but compared to string
SELECT * FROM employees WHERE emp_id = '1001';
-- Database converts every emp_id to string for comparison → scan

-- FIX: Match types
SELECT * FROM employees WHERE emp_id = 1001;

Pattern 5: SELECT * When You Need 2 Columns

-- SLOW: Must look up ALL columns from the table after index seek
SELECT * FROM employees WHERE department = 'Engineering';

-- FASTER: Covering index can satisfy the query entirely
SELECT first_name, salary FROM employees WHERE department = 'Engineering';
-- With covering index: CREATE INDEX ... ON (department) INCLUDE (first_name, salary)

Index Maintenance (Fragmentation and Rebuild)

Over time, indexes become fragmented (like a hard drive) as data is inserted and deleted:

-- Check fragmentation
SELECT
    i.name AS index_name,
    s.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('employees'), NULL, NULL, 'LIMITED') s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id;

-- Reorganize (light maintenance, < 30% fragmentation)
ALTER INDEX IX_emp_city ON employees REORGANIZE;

-- Rebuild (heavy maintenance, > 30% fragmentation)
ALTER INDEX IX_emp_city ON employees REBUILD;

-- Rebuild all indexes on a table
ALTER INDEX ALL ON employees REBUILD;

Schedule: Rebuild indexes weekly or after large data loads.

The Index Design Checklist

✅ Index columns that appear in WHERE clauses
✅ Index columns that appear in JOIN conditions
✅ Index columns that appear in ORDER BY
✅ Put most selective column FIRST in composite indexes
✅ Use INCLUDE for columns needed in SELECT but not filtered
✅ One clustered index per table (usually the primary key)
✅ Keep total indexes under 5-7 per table
✅ Rebuild fragmented indexes weekly

Common Mistakes

  1. Too many indexes — every index slows down INSERT, UPDATE, DELETE (the index must be updated too). 5-7 indexes per table is typical. 20+ is a problem.

  2. Wrong column order in composite index(city, department) does NOT help WHERE department = 'Eng'. Put the most filtered column first.

  3. Functions on indexed columnsWHERE YEAR(hire_date) = 2023 cannot use an index on hire_date. Rewrite as a range.

  4. Indexing low-cardinality columns — an index on gender (2 values) is useless. The database will scan half the table either way. Index high-cardinality columns (email, emp_id, city).

  5. Never checking execution plans — you cannot optimize what you cannot see. Always check the plan for slow queries.

  6. Not rebuilding fragmented indexes — fragmented indexes perform like no index at all. Rebuild weekly.

Interview Questions

Q: What is the difference between a clustered and non-clustered index? A: A clustered index sorts the actual table data — one per table, usually the primary key. A non-clustered index is a separate sorted structure with pointers back to the table — up to 999 per table. Clustered index seeks are fastest because data is already sorted. Non-clustered seeks require an additional lookup to fetch full row data.

Q: When should you create an index? A: On columns frequently used in WHERE clauses, JOIN conditions, and ORDER BY. The column should have high cardinality (many distinct values). Do not index columns rarely filtered or columns with very few distinct values. Balance read speed (indexes help) with write speed (indexes slow down inserts/updates).

Q: What is a covering index? A: An index that includes ALL columns needed by a query, so the database never needs to look up the actual table row. Created with INCLUDE clause. Eliminates the key lookup operator in execution plans, making queries significantly faster for specific query patterns.

Q: Why would a query not use an existing index? A: Common reasons: function applied to the indexed column (YEAR(date)), implicit type conversion, leading wildcard in LIKE, query returns too many rows (optimizer chooses scan), or the index columns are in the wrong order for the query’s WHERE clause.

Wrapping Up

Indexes are the single biggest performance lever in SQL. A missing index can make a query 100x slower. A well-designed index can make it 100x faster. Learn to read execution plans, identify table scans, and create targeted indexes — these skills matter more in production than any clever query trick.

Related posts:SQL Execution Order & WHERE ClausesSQL FunctionsDelta Lake Optimization (OPTIMIZE, Z-ORDER)


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link