This article looks at the differences between Django QuerySet's defer()
, only()
, and exclude()
methods.
--
Database queries are usually the bottlenecks of most web apps. Understanding how to write efficient queries can help you significantly speed up your apps. When working with databases, the rule of thumb is that fetching less data is generally faster.
To minimize the amount of data fetched when working with Django, you can leverage the defer()
, only()
, and exclude()
methods.
Let's learn about them and how to use them in practice!
Contents
Project Introduction
Throughout the article, we'll be working on a real estate web app. You can clone the project from the GitHub repo and follow along, or just read through the article.
To set up the project locally, follow the instructions in the README.md file.
The web app has two models, Property
and Location
:
# estates/models.py
class Location(models.Model):
city = models.CharField(max_length=128)
state = models.CharField(max_length=128)
country = models.CharField(max_length=32)
zip_code = models.CharField(max_length=32)
# ...
class Property(models.Model):
name = models.CharField(max_length=256)
description = models.TextField()
property_type = models.CharField(max_length=20, choices=PROPERTY_TYPES)
location = models.ForeignKey(Location, on_delete=models.CASCADE)
square_feet = models.PositiveIntegerField()
bedrooms = models.PositiveSmallIntegerField()
bathrooms = models.PositiveSmallIntegerField()
has_garage = models.BooleanField(default=False)
has_balcony = models.BooleanField(default=False)
# ...
Both models have several fields and contain a to_json()
method, which serializes all the model's attributes to a Python dictionary.
The web app provides a simple API with the following endpoints:
/
returns a shortened serialized list of all the properties/<int:id>/
returns all the details of a specific property/<int:id>/amenities/
returns the amenities of a specific property
Initial Benchmark
Before working on the optimizations, we'll benchmark the web app using the Django Silk package. Django Silk is an easy-to-use live profiling and inspection tool that stores HTTP requests and database queries. These can later be viewed in a simple dashboard.
To generate the performance reports, we can hit the web app's endpoints:
You'll notice that the list view and amenities view response contains a subset of the property attributes, while the detail view contains all the property attributes.
Next, navigate to http://127.0.0.1:8000/silk/requests/ and inspect the just-created reports.
In my case, the list view takes 765
milliseconds (301
milliseconds of which are spent on queries), while the other two take 11
milliseconds (~1
on queries).
Looking at the SQL queries, we can see that all three of them produce a similar query:
-- List query
SELECT * FROM "estates_property"
INNER JOIN "estates_location" ON (
"estates_property"."location_id" = "estates_location"."id"
)
-- Detail query
SELECT * FROM "estates_property"
INNER JOIN "estates_location" ON (
"estates_property"."location_id" = "estates_location"."id"
)
WHERE "estates_property"."id" = 1
-- Amenities query
SELECT * FROM "estates_property"
WHERE "estates_property"."id" = 1
All three views fetch all property fields even though the amenities and list views only display a small subset of them. Let's fix this!
defer()
You can use the defer()
method when querying Django models to "exclude" specific fields that aren't required for your particular query. For example, in a list view, you can defer all the fields you aren't displaying (e.g., description
, created_at
, and updated_at
).
This method has the biggest impact when deferring fields that contain a lot of data (e.g., text fields) or fields that require expensive processing to convert them to Python objects (e.g., JSON).
defer()
works on the attribute level. Under the hood, it changes the SQL like so:
-- No defer(): The query fetches all the columns
SELECT * FROM some_table;
-- Using defer(): The query fetches all the columns except the deferred ones
SELECT column_1, column_2, ... column_n FROM some_table;
In practice, we could defer our properties_list_view()
QuerySet's description
attribute:
properties = Property.objects.select_related("location").defer(
"description"
)
We're using
select_related()
to avoid the N+1 query. More information in this article.
If we benchmark the endpoint again, we can see that the SQL SELECT
query no longer contains the description
attribute. Additionally, the query time decreased from 765
milliseconds to 184
milliseconds. That's around 4x improvement!
only()
The only()
method is the exact opposite of the defer()
method.
This method allows you to define a subset of fields you want fetched from the database instead of deferring specific fields. It's most advantageous when displaying only a small subset of fields.
Again, it works on the attribute level and similarly modifies the underlying SQL:
-- No only(): The query fetches all the columns
SELECT * FROM some_table;
-- Using only(): The query fetches only the provided subset of columns
SELECT only_column1, only_column2, ... only_column_n FROM some_table;
To test it in action, we can modify properties_list_view
QuerySet like so:
properties = Property.objects.select_related("location").only(
"id", "name", "location", "price"
)
After benchmarking, we can see another slight improvement. The query time dropped from 184
milliseconds to around 154
milliseconds.
If you wish to fetch only specific fields of a related model, you can use Django's
__
lookup separator. For example:location__city
will only fetch the location's city field.Additionally, you can make multiple calls to
defer()
andonly()
in a single query.
exclude()
The exclude()
method is the opposite of the filter()
method. It returns objects that do not match the given lookup parameters. While the previous two methods operate at the column level, this one functions at the row level.
For example:
# Fetches all the apartments
apartments = Property.objects.filter(property_type=PROPERTY_TYPE_APARTMENT)
# Fetches all the properties that aren't apartments
non_apartments = Property.objects.exclude(property_type=PROPERTY_TYPE_APARTMENT)
Produces the following SQL:
-- Using filter(): The query fetches all the apartments
SELECT * FROM "estates_property" WHERE ("estates_property"."property_type" = AP)
-- Using exclude(): The query fetches all properties that aren't apartments
SELECT * FROM "estates_property" WHERE NOT ("estates_property"."property_type" = AP)
The same as with
filter()
you're able to stack conditions. For example, to get properties that are not land and are bigger than1000
feet, you could do this:big_buildings = ( Property.objects. exclude(property_type=PROPERTY_TYPE_LAND, square_feet__lt=1000) )
The Pitfall of defer() and only()
Using defer()
and only()
incorrectly can lead to performance issues rather than improvements.
Although you can use these two methods to exclude fields from the initial query, Django doesn't block you from accessing the excluded fields later on. That can potentially result in additional database queries worsening the performance than if you fetch all the fields from the get-go.
At the moment, our property_amenities_view
looks like so:
def property_amenities_view(request, id):
property = Property.objects.only(
"id", "has_garage", "has_balcony", "has_basement", "has_pool"
).get(id=id)
return JsonResponse({
"id": property.id,
"has_garage": property.has_garage,
"has_balcony": property.has_balcony,
"has_basement": property.has_basement,
"has_pool": property.has_pool,
})
The view leverages only()
to avoid fetching any unnecessary fields from the database.
The underlying SQL looks something like this:
SELECT "estates_property"."id",
"estates_property"."has_garage",
"estates_property"."has_balcony",
"estates_property"."has_basement",
"estates_property"."has_pool"
FROM "estates_property" WHERE "estates_property"."id" = 1
Then after some time we change our minds and decide to add bedrooms
and bathrooms
to the JSON response since they're technically amenities as well:
def property_amenities_view(request, id):
property = Property.objects.only(
"id", "has_garage", "has_balcony", "has_basement", "has_pool"
).get(id=id)
return JsonResponse({
"id": property.id,
"bedrooms": property.bedrooms, # new
"bathrooms": property.bathrooms, # new
"has_garage": property.has_garage,
"has_balcony": property.has_balcony,
"has_basement": property.has_basement,
"has_pool": property.has_pool,
})
By doing that, we just introduced two additional queries. Since we didn't include bedrooms
and bathrooms
in the only()
subset, Django now fetches the additional fields separately.
The underlying SQL now looks like so:
-- This query fetches the only() field subset
SELECT "estates_property"."id",
"estates_property"."has_garage",
"estates_property"."has_balcony",
"estates_property"."has_basement",
"estates_property"."has_pool"
FROM "estates_property" WHERE "estates_property"."id" = 1
-- An extra query for fetching the bedrooms
SELECT "estates_property"."id",
"estates_property"."bedrooms"
FROM "estates_property" WHERE "estates_property"."id" = 1
-- An extra query for fetching the bathrooms
SELECT "estates_property"."id",
"estates_property"."bathrooms"
FROM "estates_property" WHERE "estates_property"."id" = 1
Remember, when refactoring code, you should be careful about only()
and exclude()
.
You can also protect against additional SQL queries from appearing when working on those views in he future by leveraging django_assert_num_queries. For more on this, review the Automating Performance Testing in Django article.
Conclusion
To speed up your Django queries, you should try fetch the minimal amount of data from the database. This applies to both rows (model objects) as well as columns (model attributes).
The two methods that allow you to control which model attributes are fetched are defer()
and only()
. These two make the most significant impact when excluding attributes that contain a lot of data or require expensive processing.
On the other hand, to make exclusions on the object level, you can leverage exclude()
.
For more Django optimization tips, check out: