Published on

What Are SQL Indexes? Why They Matter and How They Work

Authors

    What Are SQL Indexes? Why They Matter and How They Work

    Indexes are one of the most important tools for improving database performance.

    In this article, you’ll learn:

    • What SQL indexes are
    • Why they dramatically speed up queries
    • How to create them
    • An example showing performance differences
    • How they work internally

    Let’s make indexes intuitive.

    What Is an Index?

    An index in SQL is a special data structure (usually a B-Tree) that allows the database to find rows much faster, without scanning the entire table.

    You can think of an index as:

    • A sorted list of values
    • Each value points to a row in the main table
    • Organized so the database can search it quickly (logarithmic time)

    Without an index → the database must check every row.
    With an index → the database jumps directly to the matching values.

    Why SQL Indexes Were Created

    Indexes exist to solve one problem:

    Speed up queries that search for specific values.

    They were created because full table scans become too slow as datasets grow.
    In a table with 10 rows, scanning them all is fast.
    In a table with 10 million rows, doing this repeatedly becomes unusable.

    Indexes allow:

    • Fast lookups
    • Fast sorting (ORDER BY)
    • Fast joins
    • Fast uniqueness checks (via UNIQUE index)

    Databases like PostgreSQL, MySQL, and SQLite all rely on them.

    Example with JOINs

    Indexes also speed up JOINs dramatically.

    Let’s use the users and orders tables:

    SELECT users.name, orders.product
    FROM orders
    JOIN users ON orders.user_id = users.id
    WHERE users.email = 'bob@example.com';
    
    CREATE INDEX idx_users_email ON users(email);
    CREATE INDEX idx_orders_user_id ON orders(user_id);
    

    Without Indexes: Full Table Scans on Both Sides

    When no indexes exist on users.email and orders.user_id, the database must:

    • Scan every row in users to find any rows matching the email condition
    • Scan every row in orders to find rows whose user_id matches those user IDs

    This is called a sequential scan or full table scan.

    Example scale:

    • users table: 100,000 rows
    • orders table: 1,000,000 rows

    The database must examine:

    • 100,000 user rows
    • 1,000,000 order rows

    Even though the final result might be a single row, the search cost is very high.

    Approximate performance (varies by machine):

    • Checking 100k users + 1M orders
    • Without indexes: ~300–600 ms
    • With indexes: ~1–3 ms

    That's a 100× to 500× improvement.

    With Indexes: Fast, Direct Lookup

    With the two indexes:

    • idx_users_email → allows instant lookup of the user with that email
    • idx_orders_user_id → allows instant lookup of all orders belonging to that user

    Instead of scanning millions of rows:

    • The database jumps directly to the correct entries in the index (a B-tree lookup)
    • Only the relevant rows are touched
    • The JOIN becomes extremely fast

    What Influences Performance?

    Exact timing varies by hardware, but these machine-level factors matter most:

    FactorEffect
    Disk type (SSD vs HDD)Huge impact: SSDs are 10–100× faster for random access
    RAM sizeIf tables/indexes fit in memory, queries become drastically faster
    CPU speedAffects how quickly the B-tree nodes are scanned
    Buffer cacheWarm caches make repeated queries much faster
    Table sizeThe larger the table, the more important indexes become

    Indexes are essential in applications because even powerful machines struggle with repeated full-table scans as data grows.

    • Without indexes: the database must read every row in both tables
    • With indexes: the database performs fast lookups using efficient data structures
    • This makes JOINs scale to millions of rows with millisecond-level response times

    How Indexes Work Internally (Simple Explanation)

    Most SQL databases use **B-Trees **, which:

    • Keep values sorted
    • Allow binary-like search
    • Have extremely shallow depth (usually 3–5 levels even for huge tables)

    This means:

    • Finding a row requires only a few comparisons
    • Access is predictable and fast
    • Indexes grow logarithmically, not linearly

    You don’t need to understand the algorithm deeply, but here’s the key idea:

    It’s much faster to search a sorted tree than a million unsorted rows.

    When Should You Use an Index?

    Use an index when you:

    • SELECT by a column frequently
    • Filter with WHERE
    • Sort using ORDER BY
    • JOIN tables on foreign keys
    • Enforce uniqueness (UNIQUE INDEX)
    • Search using equality or range queries

    Indexes are most valuable on:

    • email
    • username
    • created_at
    • foreign keys (user_id, post_id)
    • slugs
    • lookup columns

    When Not to Use an Index

    Indexes have costs:

    • They take disk space
    • They slow down writes (INSERT/UPDATE/DELETE must update the index)

    Avoid indexing:

    • boolean columns
    • very low-cardinality columns (e.g., gender, status = “active”)
    • columns rarely used in queries
    • very large text columns