Inefficient database querying is one of the most common performance pitfalls with Django. N+1 queries in particular can have a negative impact on your application's performance early on. They occur when you select records from an associated table using an individual query for each record rather than grabbing all records in a single query. Such inefficiencies are unfortunately quite easy to introduce with the Django ORM. That being said, they are something you can quickly uncover and prevent via automated testing.
This article looks at how to:
- Test the number of queries executed by a request along with the duration of the queries
- Prevent N+1 queries using the nplusone package
Contents
N+1 Queries
The sample app that we'll be working with throughout this article can be found on GitHub.
Say, for example, you're working with a Django application that has the following models:
# courses/models.py
from django.db import models
class Author(models.Model):
name = models.CharField(max_length=100)
def __str__(self):
return self.name
class Course(models.Model):
title = models.CharField(max_length=100)
author = models.ForeignKey(Author, on_delete=models.CASCADE)
def __str__(self):
return self.title
Now, if you're tasked with creating a new view for returning a JSON response of all courses with the title and author name, you could write the following code:
# courses/views.py
from django.http import JsonResponse
from courses.models import Course
def all_courses(request):
queryset = Course.objects.all()
courses = []
for course in queryset:
courses.append(
{"title": course.title, "author": course.author.name}
)
return JsonResponse(courses, safe=False)
This code will work, but it's very inefficient since it will make far too many database queries:
- 1 query for obtaining all the courses
- N queries for obtaining the branch in each iteration
Before addressing this, let's look at just how many queries are made and measure the execution time.
Metrics Middleware
You'll notice that the project includes custom middleware that calculates and logs the execution time of each request:
# core/middleware.py
import logging
import time
from django.db import connection, reset_queries
def metric_middleware(get_response):
def middleware(request):
reset_queries()
# Get beginning stats
start_queries = len(connection.queries)
start_time = time.perf_counter()
# Process the request
response = get_response(request)
# Get ending stats
end_time = time.perf_counter()
end_queries = len(connection.queries)
# Calculate stats
total_time = end_time - start_time
total_queries = end_queries - start_queries
# Log the results
logger = logging.getLogger("debug")
logger.debug(f"Request: {request.method} {request.path}")
logger.debug(f"Number of Queries: {total_queries}")
logger.debug(f"Total time: {(total_time):.2f}s")
return response
return middleware
Run the database seed command to add 10 authors and 100 courses to the database:
$ python manage.py seed_db
With the Django development server up and running, navigate to http://localhost:8000/courses/ in your browser. You should see the JSON response. Back in your terminal, take note of the metrics:
Request: GET /courses/
Number of Queries: 101
Total time: 0.10s
That's a lot of queries! This is very inefficient. Each additional author and course added will require an additional database query, so performance will continue to degrade as the database grows. Fortunately, the fix for this is quite simple: You can add a select_related
method to create a SQL join which will include the authors in the initial database query.
queryset = Course.objects.select_related("author").all()
Before making any code changes, let's first start with some tests.
Performance Tests
Start with the following test, which uses the django_assert_num_queries pytest fixture to ensure that the database is hit only once when there is one or more author and course records present in the database:
import json
import pytest
from faker import Faker
from django.test import override_settings
from courses.models import Course, Author
@pytest.mark.django_db
def test_number_of_sql_queries_all_courses(client, django_assert_num_queries):
fake = Faker()
author_name = fake.name()
author = Author(name=author_name)
author.save()
course_title = fake.sentence(nb_words=4)
course = Course(title=course_title, author=author)
course.save()
with django_assert_num_queries(1):
res = client.get("/courses/")
data = json.loads(res.content)
assert res.status_code == 200
assert len(data) == 1
author_name = fake.name()
author = Author(name=author_name)
author.save()
course_title = fake.sentence(nb_words=4)
course = Course(title=course_title, author=author)
course.save()
res = client.get("/courses/")
data = json.loads(res.content)
assert res.status_code == 200
assert len(data) == 2
Not using pytest? Use the assertNumQueries test method in place of
django_assert_num_queries
.
What's more, we can also use nplusone to prevent the introduction of future N+1 queries. After installing the package and adding it to the settings file, you can add it to your tests with the @override_settings
decorator:
...
@pytest.mark.django_db
@override_settings(NPLUSONE_RAISE=True)
def test_number_of_sql_queries_all_courses(client, django_assert_num_queries):
...
Or if you'd like to automatically enable nplusone across the entire test suite, add the following to your test root conftest.py file:
from django.conf import settings
def pytest_configure(config):
settings.NPLUSONE_RAISE = True
Hop back to the sample app, and you run the tests. You should see the following error:
nplusone.core.exceptions.NPlusOneError: Potential n+1 query detected on `Course.author`
Now, make the recommended change -- adding the select_related
method -- and then run the tests again. They should now pass.
Conclusion
This article looked at how to prevent N+1 queries automatically in your codebase using the nplusone package and test the number of queries executed using the django_assert_num_queries
pytest fixture.
This should help prevent performance bottlenecks as your applications grows and takes on more users. If you're adding this to an existing codebase, you'll probably need to spend some time fixing broken queries so that they have a constant number of database hits. If you're still running into performance issues after the fixes and optimizations, you may need to add additional caching layers, denormalize parts of the database, and/or configure database indexes.