Database indexing is a powerful technique that helps speed up data retrieval and optimize the performance of your queries in relational databases. By creating indexes, you allow the database to locate information much faster, which can significantly enhance the efficiency of your application, especially when dealing with large datasets.
In developing an application with Django, understanding and implementing proper indexing on your database can significantly improve the responsiveness of your web applications. Whether building a small project or a complex system, knowing how to use indexing effectively is key to delivering a faster and more seamless user experience.
This article explores the basics of database indexing, its advantages and disadvantages, and how to apply it in a Django application.
Contents
What is Database Indexing?
Database indexing is a technique that is used to optimize the performance of a database by reducing the time it takes to retrieve data. It involves using a database index which acts like a roadmap for the database to find rows efficiently without scanning the entire table.
What is a Database Index?
A database index is a data structure used to improve the speed of data retrieval operations on a database table at the cost of additional storage space and potential performance impacts on write operations (like INSERT
, UPDATE
, and DELETE
).
How Does Indexing Work?
An index works just like an index in a book. To locate the information, instead of flipping through every page, you can use the index to quickly locate the information you need. Similarly, in a database, an index stores pointers to rows in a table. When a query is executed, the database engine checks the index first to quickly find the data.
Types of Indexes
- Primary Index: This index is automatically created for primary key columns in the database.
- Unique Index: When a column is defined with the
unique
attribute, a unique index is created which ensures all values in the column are distinct. - Composite Index: This sort of index involves multiple columns.
- Full-Text Index: Optimized for searching large text data.
- Clustered Index: Determines the physical order of data in a table. A table can have only one clustered index.
- Non-Clustered Index: Contains a pointer to the actual data, allowing multiple non-clustered indexes per table.
Advantages and Disadvantages
Database indexing is powerful, but it comes with benefits and trade-offs. Understanding these can help you decide when and how to use indexing effectively.
Advantages
- Faster Data Retrieval: Indexes significantly reduce the time required to find specific rows in a table, especially for large datasets. This is particularly useful for queries involving
WHERE
,JOIN
, andORDER BY
clauses. - Improved Query Performance: Queries that frequently filter or sort data benefit from indexes, as they allow the database to skip scanning the entire table.
- Supports Unique Constraints: Indexes also help enforce uniqueness in columns, ensuring no duplicate values are inserted, as seen with primary keys and
UNIQUE
constraints. - Optimized Full-Text Searches: Specialized indexes, like full-text indexes, make searching large text fields much faster, particularly in databases like MySQL and Postgres.
- Better Scalability: As the data in the database grows, proper indexing helps maintain performance, ensuring queries remain efficient even with millions of rows.
Disadvantages
- Increased Storage Requirements: Indexes consume additional disk space as an index is created in addition to the database tables and columns. For large tables with multiple indexes, this can lead to significant storage overhead.
- Slower Write Operations: Insertions, updates, and deletions become slower because the database must update the indexes in addition to the data itself.
- Maintenance Overhead: Indexes need to be kept up-to-date as the data in the table changes, which adds computational overhead.
- Not Always Effective: If a query doesn't use the indexed column(s), the index won't help. For example, using functions or calculations on indexed fields might bypass the index. The database query optimizer might also decide that a full body scan is more effective than using indexing which nullifies the reason for database indexing.
- Risk of Over-Indexing: Adding too many indexes can degrade overall performance, as the database spends more time maintaining indexes than executing queries.
When to Use Indexing
- Index frequently queried fields or columns used in
WHERE
,JOIN
, orORDER BY
. - Avoid indexing fields that are rarely used in queries.
- Strike a balance between read and write performance based on your application's needs.
Indexing in Django
Django is a popular Python web framework which provides a high-level abstraction for database interactions, including indexing. With Django's ORM, you can define indexes in your models to optimize query performance.
Creating Indexes in Django
Django can create several types of database indexes, either automatically or through explicit configuration in your models.
In Django, you can define indexes using db_index
for a single index column as well as the indexes
option in a model's Meta class.
Index Types in Django
Index Type | Created Automatically? | Example of Use |
---|---|---|
Primary Key Index | Yes | Default primary key |
Foreign Key Index | Yes | Foreign key relationships |
Unique Index | Yes | unique=True on fields |
Single-column Index | No (explicit) | db_index or Meta.indexes |
Composite Index | No (explicit) | Meta.indexes with multiple fields |
Unique Composite Index | Yes (unique_together ) |
unique_together or UniqueConstraint |
Full-Text Index | No (explicit, Postgres only) | GinIndex |
Partial Index | No (explicit, Postgres only) | Index with condition |
By using these automatic and custom indexing options, Django provides robust support for optimizing database queries.
Sample Project
To better understand the concept of database indexing in Django, we created a sample Django application, a product management system. The app is used to manage products in a store, allowing users to filter, sort, and search efficiently.
The product management system includes the following features:
- Add and manage products with details like name, category, price, and creation date.
- Filter products by category to find items within specific groups.
- Sort products by price to help users find the cheapest or most expensive items.
- Use indexes to enhance query performance for filtering and sorting operations.
To follow along, clone down the repository, and spin up the project:
$ git clone https://github.com/testdrivenio/django-db-indexes.git
$ cd django-db-indexes
$ docker compose up -d --build
To simplify things, our project uses Docker to spin up the Django app as well as Postgres. Curious about how this project was developed? Check out the Dockerizing Django with Postgres, Gunicorn, and Nginx article.
Django Models
Take note of the models that we defined in product/models.py:
- ProductWithoutIndex: This model has no indexes defined.
- ProductWithSingleIndex: This model has three single column indexes via
db_index=True
on thename
,category
andprice
columns. - ProductWithCompositeIndex: This model has a composite index on the
category
andprice
columns.
To view the indexes, jump into psql:
$ docker compose exec api python manage.py dbshell
ProductWithoutIndex Model
# \d product_without_index
Yo can see the default primary key index:
Table "public.product_without_index"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
name | character varying(100) | | not null |
category | character varying(50) | | not null |
price | integer | | not null |
created_at | timestamp with time zone | | not null |
Indexes:
"product_without_index_pkey" PRIMARY KEY, btree (id)
ProductWithSingleIndex Model
# \d product_with_single_index
Results:
Table "public.product_with_single_index"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
name | character varying(100) | | not null |
category | character varying(50) | | not null |
price | integer | | not null |
created_at | timestamp with time zone | | not null |
Indexes:
"product_with_single_index_pkey" PRIMARY KEY, btree (id)
"product_with_single_index_category_715a00f3" btree (category)
"product_with_single_index_category_715a00f3_like" btree (category varchar_pattern_ops)
"product_with_single_index_name_d6fb2180" btree (name)
"product_with_single_index_name_d6fb2180_like" btree (name varchar_pattern_ops)
"product_with_single_index_price_5a707788" btree (price)
Single Indexes are created for each of the columns where db_index
is set to True
. Indexes are created for each one for both direct filters and LIKE
filters.
ProductWithCompositeIndex Model
# \d product_with_composite_index
Results:
Table "public.product_with_composite_index"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
name | character varying(100) | | not null |
category | character varying(50) | | not null |
price | integer | | not null |
created_at | timestamp with time zone | | not null |
Indexes:
"product_with_composite_index_pkey" PRIMARY KEY, btree (id)
"category_price_idx" btree (category, price)
"product_wit_name_5adbb5_idx" btree (name)
A composite index is created for both category
and price
. This index makes filters that combine both category and price together faster than setting an index on each column individually.
Dummy Data
The power of indexing shines much better when you have a large dataset. In order to help with that, we created a seed_db
management command. This command creates 500k records for each category in each table.
Run the command to seed the database:
$ docker compose exec api python manage.py seed_db
This will take a while to complete, so feel free to grab a cup of coffee.
Testing
To see how the tables and indexes perform, we have another management command called run_tests
. Take a quick look at the script in product/management/commands/run_tests.py.
The command accepts a couple of arguments
table_type
(compulsory): This is an int with1
representingProductWithoutIndex
,2
representingProductWithSingleIndex
, and3
representingProductWithCompositeIndex
category
: Must be"electronics"
,"clothing"
, or"home appliances"
price
: Integer used to filter items
Basic Query
Let's start with a simple query to get all objects.
ProductWithoutIndex Model
$ docker compose exec api python manage.py run_tests --table_type=1 --category="electronics"
Results:
[{'Execution Time': 41.694,
'Plan': {'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 39.795,
'Actual Total Time': 41.682,
'Async Capable': False,
'Node Type': 'Gather',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 1,
'Plan Width': 35,
'Plans': [{'Actual Loops': 3,
'Actual Rows': 0,
'Actual Startup Time': 37.769,
'Actual Total Time': 37.77,
'Alias': 'product_without_index',
'Async Capable': False,
'Filter': '((product_without_index.category)::text = '
"'electronics'::text)",
'Node Type': 'Seq Scan',
'Output': ['id',
'name',
'category',
'price',
'created_at'],
'Parallel Aware': True,
'Parent Relationship': 'Outer',
'Plan Rows': 1,
'Plan Width': 35,
'Relation Name': 'product_without_index',
'Rows Removed by Filter': 500000,
'Schema': 'public',
'Startup Cost': 0.0,
'Total Cost': 20241.5,
'Workers': [{'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 36.902,
'Actual Total Time': 36.903,
'Worker Number': 0},
{'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 36.892,
'Actual Total Time': 36.893,
'Worker Number': 1}]}],
'Single Copy': False,
'Startup Cost': 1000.0,
'Total Cost': 21241.6,
'Workers Launched': 2,
'Workers Planned': 2},
'Planning Time': 0.358,
'Triggers': []}]
We can see that a full body scan was performed from Node Type: Seq Scan
. This took about 42ms.
ProductWithSingleIndex Model
$ docker compose exec api python manage.py run_tests --table_type=2 --category="electronics"
Results:
[{'Execution Time': 0.033,
'Plan': {'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 0.019,
'Actual Total Time': 0.02,
'Alias': 'product_with_single_index',
'Async Capable': False,
'Index Cond': '((product_with_single_index.category)::text = '
"'electronics'::text)",
'Index Name': 'product_with_single_index_category_715a00f3',
'Node Type': 'Index Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 1,
'Plan Width': 35,
'Relation Name': 'product_with_single_index',
'Rows Removed by Index Recheck': 0,
'Scan Direction': 'Forward',
'Schema': 'public',
'Startup Cost': 0.43,
'Total Cost': 4.45},
'Planning Time': 0.633,
'Triggers': []}]
In this query, the database optimizer uses the index as the Scan type and uses the correct index. This query takes about 0.033ms to complete which is a far better option than on the table without an index.
ProductWithCompositeIndex Model
$ docker compose exec api python manage.py run_tests --table_type=3 --category="electronics"
Results:
[{'Execution Time': 0.032,
'Plan': {'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 0.02,
'Actual Total Time': 0.02,
'Alias': 'product_with_composite_index',
'Async Capable': False,
'Index Cond': '((product_with_composite_index.category)::text = '
"'electronics'::text)",
'Index Name': 'category_price_idx',
'Node Type': 'Index Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 1,
'Plan Width': 35,
'Relation Name': 'product_with_composite_index',
'Rows Removed by Index Recheck': 0,
'Scan Direction': 'Forward',
'Schema': 'public',
'Startup Cost': 0.43,
'Total Cost': 6.2},
'Planning Time': 0.293,
'Triggers': []}]
This is faster as the Database uses the Index category_price_idx
, which we defined on the table.
Filtered Query
Let's try a query that combines both the category and the price. It's expected that the composite index should be faster than the single index column while this in turn should be faster when compared to the table with no index.
ProductWithoutIndex Model
$ docker compose exec api python manage.py run_tests --table_type=1 --category="electronics" --price=1000
Results:
[{'Execution Time': 44.069,
'Plan': {'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 42.227,
'Actual Total Time': 44.056,
'Async Capable': False,
'Node Type': 'Gather',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 1,
'Plan Width': 35,
'Plans': [{'Actual Loops': 3,
'Actual Rows': 0,
'Actual Startup Time': 40.342,
'Actual Total Time': 40.342,
'Alias': 'product_without_index',
'Async Capable': False,
'Filter': '((product_without_index.price <= 1000) AND '
'((product_without_index.category)::text = '
"'electronics'::text))",
'Node Type': 'Seq Scan',
'Output': ['id',
'name',
'category',
'price',
'created_at'],
'Parallel Aware': True,
'Parent Relationship': 'Outer',
'Plan Rows': 1,
'Plan Width': 35,
'Relation Name': 'product_without_index',
'Rows Removed by Filter': 500000,
'Schema': 'public',
'Startup Cost': 0.0,
'Total Cost': 21804.0,
'Workers': [{'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 39.573,
'Actual Total Time': 39.574,
'Worker Number': 0},
{'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 39.567,
'Actual Total Time': 39.567,
'Worker Number': 1}]}],
'Single Copy': False,
'Startup Cost': 1000.0,
'Total Cost': 22804.1,
'Workers Launched': 2,
'Workers Planned': 2},
'Planning Time': 0.26,
'Triggers': []}]
Again, a full body scan was performed. It took approximately 44ms.
ProductWithSingleIndex Model
$ docker compose exec api python manage.py run_tests --table_type=2 --category="electronics" --price=1000
Results:
[{'Execution Time': 0.041,
'Plan': {'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 0.011,
'Actual Total Time': 0.011,
'Alias': 'product_with_single_index',
'Async Capable': False,
'Index Cond': '((product_with_single_index.category)::text = '
"'electronics'::text)",
'Index Name': 'product_with_single_index_category_715a00f3',
'Node Type': 'Index Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 1,
'Plan Width': 35,
'Relation Name': 'product_with_single_index',
'Rows Removed by Index Recheck': 0,
'Scan Direction': 'Forward',
'Schema': 'public',
'Startup Cost': 0.43,
'Total Cost': 4.45},
'Planning Time': 0.353,
'Triggers': []}]
ProductWithCompositeIndex Model
$ docker compose exec api python manage.py run_tests --table_type=3 --category="electronics" --price=1000
Results:
[{'Execution Time': 0.036,
'Plan': {'Actual Loops': 1,
'Actual Rows': 0,
'Actual Startup Time': 0.021,
'Actual Total Time': 0.022,
'Alias': 'product_with_composite_index',
'Async Capable': False,
'Index Cond': '(((product_with_composite_index.category)::text = '
"'electronics'::text) AND "
'(product_with_composite_index.price <= 1000))',
'Index Name': 'category_price_idx',
'Node Type': 'Index Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 1,
'Plan Width': 35,
'Relation Name': 'product_with_composite_index',
'Rows Removed by Index Recheck': 0,
'Scan Direction': 'Forward',
'Schema': 'public',
'Startup Cost': 0.43,
'Total Cost': 6.2},
'Planning Time': 0.365,
'Triggers': []}]
This differences tend to be more pronounced on tables with larger data sets. To further test this, you can increase the batch size of the seed_db
command to see how much faster indexing makes filtering and sorting.
Conclusion
Database indexing is an important technique for optimizing query performance and ensuring your application remains efficient as your data grows. By carefully implementing indexes in your Django models, you can significantly enhance the speed and responsiveness of your queries, especially for frequently filtered or sorted fields.
In this article, we explored the basics of database indexing, its advantages and disadvantages, and how to apply it in a Django application. Through the product management system sample project, we saw how to use indexing effectively, particularly with the db_index
attribute and the Meta
class. We also tested it practically by comparing the time differences when filtering based on the indexed fields
By combining practical implementation with best practices, you can leverage indexing to deliver a faster, smoother user experience. Remember to analyze your application's query patterns and balance the trade-offs of indexing for optimal results. With these tools and techniques, you’re well on your way to building scalable and high-performing Django applications!
Happy Indexing.