Database Indexing in Django

Database indexing is a powerful technique that helps speed up data retrieval and optimize the performance of your queries in relational databases. By creating indexes, you allow the database to locate information much faster, which can significantly enhance the efficiency of your application, especially when dealing with large datasets.

In developing an application with Django, understanding and implementing proper indexing on your database can significantly improve the responsiveness of your web applications. Whether building a small project or a complex system, knowing how to use indexing effectively is key to delivering a faster and more seamless user experience.

This article explores the basics of database indexing, its advantages and disadvantages, and how to apply it in a Django application.

What is Database Indexing?
Indexing in Django
- Creating Indexes in Django
- Index Types in Django
Sample Project
Conclusion

What is Database Indexing?

Database indexing is a technique that is used to optimize the performance of a database by reducing the time it takes to retrieve data. It involves using a database index which acts like a roadmap for the database to find rows efficiently without scanning the entire table.

What is a Database Index?

A database index is a data structure used to improve the speed of data retrieval operations on a database table at the cost of additional storage space and potential performance impacts on write operations (like INSERT, UPDATE, and DELETE).

How Does Indexing Work?

An index works just like an index in a book. To locate the information, instead of flipping through every page, you can use the index to quickly locate the information you need. Similarly, in a database, an index stores pointers to rows in a table. When a query is executed, the database engine checks the index first to quickly find the data.

Types of Indexes

Primary Index: This index is automatically created for primary key columns in the database.
Unique Index: When a column is defined with the unique attribute, a unique index is created which ensures all values in the column are distinct.
Composite Index: This sort of index involves multiple columns.
Full-Text Index: Optimized for searching large text data.
Clustered Index: Determines the physical order of data in a table. A table can have only one clustered index.
Non-Clustered Index: Contains a pointer to the actual data, allowing multiple non-clustered indexes per table.

Advantages and Disadvantages

Database indexing is powerful, but it comes with benefits and trade-offs. Understanding these can help you decide when and how to use indexing effectively.

Advantages

Faster Data Retrieval: Indexes significantly reduce the time required to find specific rows in a table, especially for large datasets. This is particularly useful for queries involving WHERE, JOIN, and ORDER BY clauses.
Improved Query Performance: Queries that frequently filter or sort data benefit from indexes, as they allow the database to skip scanning the entire table.
Supports Unique Constraints: Indexes also help enforce uniqueness in columns, ensuring no duplicate values are inserted, as seen with primary keys and UNIQUE constraints.
Optimized Full-Text Searches: Specialized indexes, like full-text indexes, make searching large text fields much faster, particularly in databases like MySQL and Postgres.
Better Scalability: As the data in the database grows, proper indexing helps maintain performance, ensuring queries remain efficient even with millions of rows.

Disadvantages

Increased Storage Requirements: Indexes consume additional disk space as an index is created in addition to the database tables and columns. For large tables with multiple indexes, this can lead to significant storage overhead.
Slower Write Operations: Insertions, updates, and deletions become slower because the database must update the indexes in addition to the data itself.
Maintenance Overhead: Indexes need to be kept up-to-date as the data in the table changes, which adds computational overhead.
Not Always Effective: If a query doesn't use the indexed column(s), the index won't help. For example, using functions or calculations on indexed fields might bypass the index. The database query optimizer might also decide that a full body scan is more effective than using indexing which nullifies the reason for database indexing.
Risk of Over-Indexing: Adding too many indexes can degrade overall performance, as the database spends more time maintaining indexes than executing queries.

When to Use Indexing

Index frequently queried fields or columns used in WHERE, JOIN, or ORDER BY.
Avoid indexing fields that are rarely used in queries.
Strike a balance between read and write performance based on your application's needs.

Indexing in Django

Django is a popular Python web framework which provides a high-level abstraction for database interactions, including indexing. With Django's ORM, you can define indexes in your models to optimize query performance.

Creating Indexes in Django

Django can create several types of database indexes, either automatically or through explicit configuration in your models. In Django, you can define indexes using db_index for a single index column as well as the indexes option in a model's Meta class.

Index Types in Django

Index Type	Created Automatically?	Example of Use
Primary Key Index	Yes	Default primary key
Foreign Key Index	Yes	Foreign key relationships
Unique Index	Yes	`unique=True` on fields
Single-column Index	No (explicit)	`db_index` or `Meta.indexes`
Composite Index	No (explicit)	`Meta.indexes` with multiple fields
Unique Composite Index	Yes (`unique_together`)	`unique_together` or `UniqueConstraint`
Full-Text Index	No (explicit, Postgres only)	`GinIndex`
Partial Index	No (explicit, Postgres only)	`Index` with condition

By using these automatic and custom indexing options, Django provides robust support for optimizing database queries.

Sample Project

To better understand the concept of database indexing in Django, we created a sample Django application, a product management system. The app is used to manage products in a store, allowing users to filter, sort, and search efficiently.

The product management system includes the following features:

Add and manage products with details like name, category, price, and creation date.
Filter products by category to find items within specific groups.
Sort products by price to help users find the cheapest or most expensive items.
Use indexes to enhance query performance for filtering and sorting operations.

To follow along, clone down the repository, and spin up the project:

$ git clone https://github.com/testdrivenio/django-db-indexes.git
$ cd django-db-indexes
$ docker compose up -d --build

To simplify things, our project uses Docker to spin up the Django app as well as Postgres. Curious about how this project was developed? Check out the Dockerizing Django with Postgres, Gunicorn, and Nginx article.

Django Models

Take note of the models that we defined in product/models.py:

ProductWithoutIndex: This model has no indexes defined.
ProductWithSingleIndex: This model has three single column indexes via db_index=True on the name, category and price columns.
ProductWithCompositeIndex: This model has a composite index on the category and price columns.

To view the indexes, jump into psql:

$ docker compose exec api python manage.py dbshell

ProductWithoutIndex Model

# \d product_without_index

Yo can see the default primary key index:

                              Table "public.product_without_index"
   Column   |           Type           | Collation | Nullable |             Default
------------+--------------------------+-----------+----------+----------------------------------
 id         | bigint                   |           | not null | generated by default as identity
 name       | character varying(100)   |           | not null |
 category   | character varying(50)    |           | not null |
 price      | integer                  |           | not null |
 created_at | timestamp with time zone |           | not null |
Indexes:
    "product_without_index_pkey" PRIMARY KEY, btree (id)

ProductWithSingleIndex Model

# \d product_with_single_index

Results:

                            Table "public.product_with_single_index"
   Column   |           Type           | Collation | Nullable |             Default
------------+--------------------------+-----------+----------+----------------------------------
 id         | bigint                   |           | not null | generated by default as identity
 name       | character varying(100)   |           | not null |
 category   | character varying(50)    |           | not null |
 price      | integer                  |           | not null |
 created_at | timestamp with time zone |           | not null |
Indexes:
    "product_with_single_index_pkey" PRIMARY KEY, btree (id)
    "product_with_single_index_category_715a00f3" btree (category)
    "product_with_single_index_category_715a00f3_like" btree (category varchar_pattern_ops)
    "product_with_single_index_name_d6fb2180" btree (name)
    "product_with_single_index_name_d6fb2180_like" btree (name varchar_pattern_ops)
    "product_with_single_index_price_5a707788" btree (price)

Single Indexes are created for each of the columns where db_index is set to True. Indexes are created for each one for both direct filters and LIKE filters.

ProductWithCompositeIndex Model

# \d product_with_composite_index

Results:

                           Table "public.product_with_composite_index"
   Column   |           Type           | Collation | Nullable |             Default
------------+--------------------------+-----------+----------+----------------------------------
 id         | bigint                   |           | not null | generated by default as identity
 name       | character varying(100)   |           | not null |
 category   | character varying(50)    |           | not null |
 price      | integer                  |           | not null |
 created_at | timestamp with time zone |           | not null |
Indexes:
    "product_with_composite_index_pkey" PRIMARY KEY, btree (id)
    "category_price_idx" btree (category, price)
    "product_wit_name_5adbb5_idx" btree (name)

A composite index is created for both category and price. This index makes filters that combine both category and price together faster than setting an index on each column individually.

Dummy Data

The power of indexing shines much better when you have a large dataset. In order to help with that, we created a seed_db management command. This command creates 500k records for each category in each table.

Run the command to seed the database:

$ docker compose exec api python manage.py seed_db

This will take a while to complete, so feel free to grab a cup of coffee.

Testing

To see how the tables and indexes perform, we have another management command called run_tests. Take a quick look at the script in product/management/commands/run_tests.py.

The command accepts a couple of arguments

table_type (compulsory): This is an int with 1 representing ProductWithoutIndex, 2 representing ProductWithSingleIndex, and 3 representing ProductWithCompositeIndex
category: Must be "electronics", "clothing", or "home appliances"
price: Integer used to filter items

Basic Query

Let's start with a simple query to get all objects.

ProductWithoutIndex Model

$ docker compose exec api python manage.py run_tests --table_type=1 --category="electronics"

Results:

[{'Execution Time': 41.694,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 0,
           'Actual Startup Time': 39.795,
           'Actual Total Time': 41.682,
           'Async Capable': False,
           'Node Type': 'Gather',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 1,
           'Plan Width': 35,
           'Plans': [{'Actual Loops': 3,
                      'Actual Rows': 0,
                      'Actual Startup Time': 37.769,
                      'Actual Total Time': 37.77,
                      'Alias': 'product_without_index',
                      'Async Capable': False,
                      'Filter': '((product_without_index.category)::text = '
                                "'electronics'::text)",
                      'Node Type': 'Seq Scan',
                      'Output': ['id',
                                 'name',
                                 'category',
                                 'price',
                                 'created_at'],
                      'Parallel Aware': True,
                      'Parent Relationship': 'Outer',
                      'Plan Rows': 1,
                      'Plan Width': 35,
                      'Relation Name': 'product_without_index',
                      'Rows Removed by Filter': 500000,
                      'Schema': 'public',
                      'Startup Cost': 0.0,
                      'Total Cost': 20241.5,
                      'Workers': [{'Actual Loops': 1,
                                   'Actual Rows': 0,
                                   'Actual Startup Time': 36.902,
                                   'Actual Total Time': 36.903,
                                   'Worker Number': 0},
                                  {'Actual Loops': 1,
                                   'Actual Rows': 0,
                                   'Actual Startup Time': 36.892,
                                   'Actual Total Time': 36.893,
                                   'Worker Number': 1}]}],
           'Single Copy': False,
           'Startup Cost': 1000.0,
           'Total Cost': 21241.6,
           'Workers Launched': 2,
           'Workers Planned': 2},
  'Planning Time': 0.358,
  'Triggers': []}]

We can see that a full body scan was performed from Node Type: Seq Scan. This took about 42ms.

ProductWithSingleIndex Model

$ docker compose exec api python manage.py run_tests --table_type=2 --category="electronics"

Results:

[{'Execution Time': 0.033,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 0,
           'Actual Startup Time': 0.019,
           'Actual Total Time': 0.02,
           'Alias': 'product_with_single_index',
           'Async Capable': False,
           'Index Cond': '((product_with_single_index.category)::text = '
                         "'electronics'::text)",
           'Index Name': 'product_with_single_index_category_715a00f3',
           'Node Type': 'Index Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 1,
           'Plan Width': 35,
           'Relation Name': 'product_with_single_index',
           'Rows Removed by Index Recheck': 0,
           'Scan Direction': 'Forward',
           'Schema': 'public',
           'Startup Cost': 0.43,
           'Total Cost': 4.45},
  'Planning Time': 0.633,
  'Triggers': []}]

In this query, the database optimizer uses the index as the Scan type and uses the correct index. This query takes about 0.033ms to complete which is a far better option than on the table without an index.

ProductWithCompositeIndex Model

$ docker compose exec api python manage.py run_tests --table_type=3 --category="electronics"

Results:

[{'Execution Time': 0.032,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 0,
           'Actual Startup Time': 0.02,
           'Actual Total Time': 0.02,
           'Alias': 'product_with_composite_index',
           'Async Capable': False,
           'Index Cond': '((product_with_composite_index.category)::text = '
                         "'electronics'::text)",
           'Index Name': 'category_price_idx',
           'Node Type': 'Index Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 1,
           'Plan Width': 35,
           'Relation Name': 'product_with_composite_index',
           'Rows Removed by Index Recheck': 0,
           'Scan Direction': 'Forward',
           'Schema': 'public',
           'Startup Cost': 0.43,
           'Total Cost': 6.2},
  'Planning Time': 0.293,
  'Triggers': []}]

This is faster as the Database uses the Index category_price_idx, which we defined on the table.

Filtered Query

Let's try a query that combines both the category and the price. It's expected that the composite index should be faster than the single index column while this in turn should be faster when compared to the table with no index.

ProductWithoutIndex Model

$ docker compose exec api python manage.py run_tests --table_type=1 --category="electronics" --price=1000

Results:

[{'Execution Time': 44.069,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 0,
           'Actual Startup Time': 42.227,
           'Actual Total Time': 44.056,
           'Async Capable': False,
           'Node Type': 'Gather',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 1,
           'Plan Width': 35,
           'Plans': [{'Actual Loops': 3,
                      'Actual Rows': 0,
                      'Actual Startup Time': 40.342,
                      'Actual Total Time': 40.342,
                      'Alias': 'product_without_index',
                      'Async Capable': False,
                      'Filter': '((product_without_index.price <= 1000) AND '
                                '((product_without_index.category)::text = '
                                "'electronics'::text))",
                      'Node Type': 'Seq Scan',
                      'Output': ['id',
                                 'name',
                                 'category',
                                 'price',
                                 'created_at'],
                      'Parallel Aware': True,
                      'Parent Relationship': 'Outer',
                      'Plan Rows': 1,
                      'Plan Width': 35,
                      'Relation Name': 'product_without_index',
                      'Rows Removed by Filter': 500000,
                      'Schema': 'public',
                      'Startup Cost': 0.0,
                      'Total Cost': 21804.0,
                      'Workers': [{'Actual Loops': 1,
                                   'Actual Rows': 0,
                                   'Actual Startup Time': 39.573,
                                   'Actual Total Time': 39.574,
                                   'Worker Number': 0},
                                  {'Actual Loops': 1,
                                   'Actual Rows': 0,
                                   'Actual Startup Time': 39.567,
                                   'Actual Total Time': 39.567,
                                   'Worker Number': 1}]}],
           'Single Copy': False,
           'Startup Cost': 1000.0,
           'Total Cost': 22804.1,
           'Workers Launched': 2,
           'Workers Planned': 2},
  'Planning Time': 0.26,
  'Triggers': []}]

Again, a full body scan was performed. It took approximately 44ms.

ProductWithSingleIndex Model

$ docker compose exec api python manage.py run_tests --table_type=2 --category="electronics" --price=1000

Results:

[{'Execution Time': 0.041,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 0,
           'Actual Startup Time': 0.011,
           'Actual Total Time': 0.011,
           'Alias': 'product_with_single_index',
           'Async Capable': False,
           'Index Cond': '((product_with_single_index.category)::text = '
                         "'electronics'::text)",
           'Index Name': 'product_with_single_index_category_715a00f3',
           'Node Type': 'Index Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 1,
           'Plan Width': 35,
           'Relation Name': 'product_with_single_index',
           'Rows Removed by Index Recheck': 0,
           'Scan Direction': 'Forward',
           'Schema': 'public',
           'Startup Cost': 0.43,
           'Total Cost': 4.45},
  'Planning Time': 0.353,
  'Triggers': []}]

ProductWithCompositeIndex Model

$ docker compose exec api python manage.py run_tests --table_type=3 --category="electronics" --price=1000

Results:

[{'Execution Time': 0.036,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 0,
           'Actual Startup Time': 0.021,
           'Actual Total Time': 0.022,
           'Alias': 'product_with_composite_index',
           'Async Capable': False,
           'Index Cond': '(((product_with_composite_index.category)::text = '
                         "'electronics'::text) AND "
                         '(product_with_composite_index.price <= 1000))',
           'Index Name': 'category_price_idx',
           'Node Type': 'Index Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 1,
           'Plan Width': 35,
           'Relation Name': 'product_with_composite_index',
           'Rows Removed by Index Recheck': 0,
           'Scan Direction': 'Forward',
           'Schema': 'public',
           'Startup Cost': 0.43,
           'Total Cost': 6.2},
  'Planning Time': 0.365,
  'Triggers': []}]

This differences tend to be more pronounced on tables with larger data sets. To further test this, you can increase the batch size of the seed_dbcommand to see how much faster indexing makes filtering and sorting.

Conclusion

Database indexing is an important technique for optimizing query performance and ensuring your application remains efficient as your data grows. By carefully implementing indexes in your Django models, you can significantly enhance the speed and responsiveness of your queries, especially for frequently filtered or sorted fields.

In this article, we explored the basics of database indexing, its advantages and disadvantages, and how to apply it in a Django application. Through the product management system sample project, we saw how to use indexing effectively, particularly with the db_index attribute and the Meta class. We also tested it practically by comparing the time differences when filtering based on the indexed fields

By combining practical implementation with best practices, you can leverage indexing to deliver a faster, smoother user experience. Remember to analyze your application's query patterns and balance the trade-offs of indexing for optimal results. With these tools and techniques, you’re well on your way to building scalable and high-performing Django applications!

Happy Indexing.

Featured Course

Test-Driven Development with Django, Django REST Framework, and Docker

In this course, you'll learn how to set up a development environment with Docker in order to build and deploy a RESTful API powered by Python, Django, and Django REST Framework.

Buy Now $30 View Course

Share this tutorial

Contents

What is Database Indexing?

What is a Database Index?

How Does Indexing Work?

Types of Indexes

Advantages and Disadvantages

Advantages

Disadvantages

When to Use Indexing

Indexing in Django

Creating Indexes in Django

Index Types in Django

Sample Project

Django Models

ProductWithoutIndex Model

ProductWithSingleIndex Model

ProductWithCompositeIndex Model

Dummy Data

Testing

Basic Query

ProductWithoutIndex Model

ProductWithSingleIndex Model

ProductWithCompositeIndex Model

Filtered Query

ProductWithoutIndex Model

ProductWithSingleIndex Model

ProductWithCompositeIndex Model

Conclusion

Test-Driven Development with Django, Django REST Framework, and Docker

Test-Driven Development with Django, Django REST Framework, and Docker

Recommended Tutorials