Approximate Counting in Django and Postgres

Last updated June 3rd, 2024

This article looks at how to speed up counting with Django and PostgreSQL. It covers two different approaches.

--

As your Django project grows, you may notice that the admin site takes increasingly longer to load. This is mainly due to Django's paginator, which relies on the slow SELECT COUNT(*) query to calculate the number of rows in a table. At some point, this query itself will account for more than 95% of the page's loading time.

But the issue isn't limited to the admin site. Simply calling count() on your queryset will result in the same problem.

In this article, we'll look at two approaches to reducing the loading time:

  1. Approximating row count using Postgres row estimation
  2. Extending Django's paginator to display only the previous and next pages

This article is based on the assumption that the exact count is irrelevant when dealing with a large amount of objects.

Contents

Project Setup

I've prepared a simple Django project to make it easier for you to follow along. The project contains an e-commerce app which allows you to manage products and purchases. On top of that, it comes with a command for populating the database.

First, clone the base branch of the GitHub repo:

$ git clone https://github.com/duplxey/django-count-approximation.git \
    --single-branch --branch base && cd django-count-approximation

Next, use Docker to spin up a Postgres instance:

$ docker run --name ecomm-postgres -p 5432:5432 \
    -e POSTGRES_USER=ecomm -e POSTGRES_PASSWORD=complexpassword123 \
    -e POSTGRES_DB=ecomm -d postgres

Alternatively, you can use a locally installed Postgres instance. Just update the DATABASES in the core/settings.py file accordingly.

Create a new virtual environment and activate it:

$ python3 -m venv venv && source venv/bin/activate

Install the requirements and migrate the database:

(venv)$ pip install -r requirements.txt
(venv)$ python manage.py migrate

Populate the database:

(venv)$ python manage.py populate_db

The command will create a super user, add a few products, and add 2.5 million purchases to the database. Since the command is pretty slow, I suggest you run multiple instances to speed up the process.

Run the server:

(venv)$ python manage.py runserver

Lastly, open your favorite web browser and navigate to http://localhost:8000/admin. Login as:

user: admin
pass: password

Great, you've successfully set up the project.

Initial Benchmark

I suggest you only proceed if you have 10 million or more purchases in the database.

The project comes with django-silk preinstalled. Django Silk is a live profiling and inspection tool that we'll use to benchmark the admin site. We'll be focusing on SQL query times.

First, open the admin site and navigate to the "Purchases" change list.

You'll notice a relatively long loading time.

To see the exact statistics, you can use Django Silk by navigating to http://localhost:8000/silk. Click on the "/admin/ecomm/purchase" request to see its details.

Django Silk Benchmarks Default Details

Ouch, the request took 4049 milliseconds, 3695 of which were spent on SQL queries.

Selecting "SQL" in the navigation bar shows that the SQL query that took the most time was SELECT COUNT(*), which took 3686 milliseconds.

Django Silk Benchmarks Default SQL

Approximate Row Count

In this article section, we'll examine how to approximate the number of objects using the built-in Postgres row estimation. This approach is extremely fast but doesn't yield the exact count.

First, create a managers.py file within the ecomm app, and put the following code inside:

# ecomm/managers.py

from django.db import connections
from django.db.models import QuerySet, Manager


class ApproximateCountQuerySet(QuerySet):
    def count(self):
        if self.query.where:
            return super(ApproximateCountQuerySet, self).count()

        cursor = connections[self.db].cursor()
        cursor.execute("SELECT reltuples FROM pg_class "
                       "WHERE relname = '%s';" % self.model._meta.db_table)

        return int(cursor.fetchone()[0])


ApproximateCountManager = Manager.from_queryset(ApproximateCountQuerySet)

This code creates a new class named ApproximateCountQuerySet, which inherits from QuerySet. Instead of using the default SELECT COUNT(*), it uses Postgres' reltuples. The estimation is only performed if no filtering is applied.

We also created a manager to be able to use ApproximateCountQuerySet directly in our models.

Next, set the Purchase model's manager in ecomm/models.py like so:

# ecomm/models.py

class Purchase(models.Model):
    # ...

    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    objects = ApproximateCountManager()  # new

    def __str__(self):
        return f"Purchase #{self.id}"

Don't forget about the import:

from ecomm.managers import ApproximateCountManager

Wait for the development server to refresh and benchmark the app again.

You should see a significant improvement. In my case, it went from more than 4000 milliseconds to around 155 milliseconds. That's around 16 times faster!

Extend Django's Paginator

In this section, we'll look at how to avoid the row count query by modifying the paginator to display only the previous and next pages. This approach is a bit faster than approximating row count but may not be as handy.

QuerySet

First, add the InfiniteCountQuerySet class to ecomm/managers.py:

# ecomm/managers.py

# ...

class InfiniteCountQuerySet(QuerySet):
    def count(self):
        return 999_999_999


InfiniteCountManager = Manager.from_queryset(InfiniteCountQuerySet)

This queryset always returns the row count of 999,999,999, without considering the actual row count. The number 999,999,999 was picked, assuming that your row count will always be lower than that. If not, feel free to bump the number.

Next, update the Purchase model's manager:

# ecomm/models.py

class Purchase(models.Model):
    # ...

    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    objects = InfiniteCountManager()

    def __str__(self):
        return f"Purchase #{self.id}"

Don't forget about the import:

from ecomm.managers import InfiniteCountManager

Wait for the server to refresh and benchmark the app once again.

You'll notice similar results to those of the previous approach. In my case, it went from 4000 milliseconds to around 133 milliseconds. The downside of this approach is that your queryset count() will always return 999,999,999 even in normal views.

Paginator

We successfully avoided the row count, but now the pagination looks off. It displays pages from 1 to 1000000000 no matter how many objects we have in the database table.

Let's fix this!

First, create a paginators.py file in the ecomms app with the following contents:

# ecomm/paginators.py

from django.core.paginator import Paginator


class PreviousNextPaginator(Paginator):
    def get_elided_page_range(self, number=1, *, on_each_side=3, on_ends=2):
        return super().get_elided_page_range(number, on_each_side=1, on_ends=0)

This code overrides the default Paginator's get_elided_page_range() to only display the previous and the next page via on_each_side=1.

Next, update PurchaseAdmin like so:

# ecomm/admin.py

class PurchaseAdmin(admin.ModelAdmin):
    # ...
    paginator = PreviousNextPaginator  # new

Don't forget about the import:

from ecomm.paginators import PreviousNextPaginator

At this point, your pagination should look like this:

Django Mid Paginator

The last thing we want to do is remove the "999999999 purchases" text. To do that, we can override the default Django admin's pagination.html template.

First, create the following directory structure within the ecomm app:

templates/
└── admin/
    └── ecom/
        └── pagination.html

Next, put the following contents in pagination.html:

<!-- ecomm/templates/admin/ecomm/pagination.html -->

{% load admin_list %}
{% load i18n %}
<p class="paginator">
{% if pagination_required %}
{% for i in page_range %}
    {% paginator_number cl i %}
{% endfor %}
{% endif %}
{% if cl.result_count < 999999999 %}
    {{ cl.result_count }}
    {% if cl.result_count == 1 %}
        {{ cl.opts.verbose_name }}
    {% else %}
        {{ cl.opts.verbose_name_plural }}{% endif %}
{% endif %}
{% if show_all_url %}
    <a href="{{ show_all_url }}" class="showall">{% translate 'Show all' %}</a>
{% endif %}
{% if cl.formset and cl.result_count %}
    <input
        type="submit"
        name="_save"
        class="default"
        value="{% translate 'Save' %}"
    >
{% endif %}
</p>

Great, the pagination now only displays the count if InfiniteCountQuerySet isn't used.

The final pagination should look like this:

Django Extended Paginator

Conclusion

In conclusion, we've looked at two ways to speed up Django counting.

Both approaches yielded great results. By leveraging them, we reduced the loading time by 16 times for a database table with 30 million entries. Feel free to customize or combine these approaches to address your needs best.

For more information, check out the following resources:

Nik Tomazic

Nik Tomazic

Nik is a software developer from Slovenia. He's interested in object-oriented programming and web development. He likes learning new things and accepting new challenges. When he's not coding, Nik's either swimming or watching movies.

Share this tutorial

Featured Course

Test-Driven Development with Django, Django REST Framework, and Docker

In this course, you'll learn how to set up a development environment with Docker in order to build and deploy a RESTful API powered by Python, Django, and Django REST Framework.

Featured Course

Test-Driven Development with Django, Django REST Framework, and Docker

In this course, you'll learn how to set up a development environment with Docker in order to build and deploy a RESTful API powered by Python, Django, and Django REST Framework.