This article looks at how to speed up counting with Django and PostgreSQL. It covers two different approaches.
--
As your Django project grows, you may notice that the admin site takes increasingly longer to load. This is mainly due to Django's paginator, which relies on the slow SELECT COUNT(*)
query to calculate the number of rows in a table. At some point, this query itself will account for more than 95% of the page's loading time.
But the issue isn't limited to the admin site. Simply calling count()
on your queryset will result in the same problem.
In this article, we'll look at two approaches to reducing the loading time:
- Approximating row count using Postgres row estimation
- Extending Django's paginator to display only the previous and next pages
This article is based on the assumption that the exact count is irrelevant when dealing with a large amount of objects.
Contents
Project Setup
I've prepared a simple Django project to make it easier for you to follow along. The project contains an e-commerce app which allows you to manage products and purchases. On top of that, it comes with a command for populating the database.
First, clone the base
branch of the GitHub repo:
$ git clone https://github.com/duplxey/django-count-approximation.git \
--single-branch --branch base && cd django-count-approximation
Next, use Docker to spin up a Postgres instance:
$ docker run --name ecomm-postgres -p 5432:5432 \
-e POSTGRES_USER=ecomm -e POSTGRES_PASSWORD=complexpassword123 \
-e POSTGRES_DB=ecomm -d postgres
Alternatively, you can use a locally installed Postgres instance. Just update the
DATABASES
in the core/settings.py file accordingly.
Create a new virtual environment and activate it:
$ python3 -m venv venv && source venv/bin/activate
Install the requirements and migrate the database:
(venv)$ pip install -r requirements.txt
(venv)$ python manage.py migrate
Populate the database:
(venv)$ python manage.py populate_db
The command will create a super user, add a few products, and add 2.5 million purchases to the database. Since the command is pretty slow, I suggest you run multiple instances to speed up the process.
Run the server:
(venv)$ python manage.py runserver
Lastly, open your favorite web browser and navigate to http://localhost:8000/admin. Login as:
user: admin
pass: password
Great, you've successfully set up the project.
Initial Benchmark
I suggest you only proceed if you have 10 million or more purchases in the database.
The project comes with django-silk preinstalled. Django Silk is a live profiling and inspection tool that we'll use to benchmark the admin site. We'll be focusing on SQL query times.
First, open the admin site and navigate to the "Purchases" change list.
You'll notice a relatively long loading time.
To see the exact statistics, you can use Django Silk by navigating to http://localhost:8000/silk. Click on the "/admin/ecomm/purchase" request to see its details.
Ouch, the request took 4049
milliseconds, 3695
of which were spent on SQL queries.
Selecting "SQL" in the navigation bar shows that the SQL query that took the most time was SELECT COUNT(*)
, which took 3686
milliseconds.
Approximate Row Count
In this article section, we'll examine how to approximate the number of objects using the built-in Postgres row estimation. This approach is extremely fast but doesn't yield the exact count.
First, create a managers.py file within the ecomm app, and put the following code inside:
# ecomm/managers.py
from django.db import connections
from django.db.models import QuerySet, Manager
class ApproximateCountQuerySet(QuerySet):
def count(self):
if self.query.where:
return super(ApproximateCountQuerySet, self).count()
cursor = connections[self.db].cursor()
cursor.execute("SELECT reltuples FROM pg_class "
"WHERE relname = '%s';" % self.model._meta.db_table)
return int(cursor.fetchone()[0])
ApproximateCountManager = Manager.from_queryset(ApproximateCountQuerySet)
This code creates a new class named ApproximateCountQuerySet
, which inherits from QuerySet
. Instead of using the default SELECT COUNT(*)
, it uses Postgres' reltuples
. The estimation is only performed if no filtering is applied.
We also created a manager to be able to use ApproximateCountQuerySet
directly in our models.
Next, set the Purchase
model's manager in ecomm/models.py like so:
# ecomm/models.py
class Purchase(models.Model):
# ...
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
objects = ApproximateCountManager() # new
def __str__(self):
return f"Purchase #{self.id}"
Don't forget about the import:
from ecomm.managers import ApproximateCountManager
Wait for the development server to refresh and benchmark the app again.
You should see a significant improvement. In my case, it went from more than 4000
milliseconds to around 155
milliseconds. That's around 16 times faster!
Extend Django's Paginator
In this section, we'll look at how to avoid the row count query by modifying the paginator to display only the previous and next pages. This approach is a bit faster than approximating row count but may not be as handy.
QuerySet
First, add the InfiniteCountQuerySet
class to ecomm/managers.py:
# ecomm/managers.py
# ...
class InfiniteCountQuerySet(QuerySet):
def count(self):
return 999_999_999
InfiniteCountManager = Manager.from_queryset(InfiniteCountQuerySet)
This queryset always returns the row count of 999,999,999
, without considering the actual row count. The number 999,999,999
was picked, assuming that your row count will always be lower than that. If not, feel free to bump the number.
Next, update the Purchase
model's manager:
# ecomm/models.py
class Purchase(models.Model):
# ...
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
objects = InfiniteCountManager()
def __str__(self):
return f"Purchase #{self.id}"
Don't forget about the import:
from ecomm.managers import InfiniteCountManager
Wait for the server to refresh and benchmark the app once again.
You'll notice similar results to those of the previous approach. In my case, it went from 4000
milliseconds to around 133
milliseconds. The downside of this approach is that your queryset count()
will always return 999,999,999
even in normal views.
Paginator
We successfully avoided the row count, but now the pagination looks off. It displays pages from 1 to 1000000000 no matter how many objects we have in the database table.
Let's fix this!
First, create a paginators.py file in the ecomms app with the following contents:
# ecomm/paginators.py
from django.core.paginator import Paginator
class PreviousNextPaginator(Paginator):
def get_elided_page_range(self, number=1, *, on_each_side=3, on_ends=2):
return super().get_elided_page_range(number, on_each_side=1, on_ends=0)
This code overrides the default Paginator
's get_elided_page_range()
to only display the previous and the next page via on_each_side=1
.
Next, update PurchaseAdmin
like so:
# ecomm/admin.py
class PurchaseAdmin(admin.ModelAdmin):
# ...
paginator = PreviousNextPaginator # new
Don't forget about the import:
from ecomm.paginators import PreviousNextPaginator
At this point, your pagination should look like this:
The last thing we want to do is remove the "999999999 purchases" text. To do that, we can override the default Django admin's pagination.html template.
First, create the following directory structure within the ecomm app:
templates/
└── admin/
└── ecom/
└── pagination.html
Next, put the following contents in pagination.html:
<!-- ecomm/templates/admin/ecomm/pagination.html -->
{% load admin_list %}
{% load i18n %}
<p class="paginator">
{% if pagination_required %}
{% for i in page_range %}
{% paginator_number cl i %}
{% endfor %}
{% endif %}
{% if cl.result_count < 999999999 %}
{{ cl.result_count }}
{% if cl.result_count == 1 %}
{{ cl.opts.verbose_name }}
{% else %}
{{ cl.opts.verbose_name_plural }}{% endif %}
{% endif %}
{% if show_all_url %}
<a href="{{ show_all_url }}" class="showall">{% translate 'Show all' %}</a>
{% endif %}
{% if cl.formset and cl.result_count %}
<input
type="submit"
name="_save"
class="default"
value="{% translate 'Save' %}"
>
{% endif %}
</p>
Great, the pagination now only displays the count if InfiniteCountQuerySet
isn't used.
The final pagination should look like this:
Conclusion
In conclusion, we've looked at two ways to speed up Django counting.
Both approaches yielded great results. By leveraging them, we reduced the loading time by 16 times for a database table with 30 million entries. Feel free to customize or combine these approaches to address your needs best.
For more information, check out the following resources: