Performance refers to how well a system or a program responds to different workloads. It can be measured in the form of how much CPU or memory the program requires, how long it runs, how scalable it is, and so on. While efficiency may not always be the top priority, it's crucial to consider it from the start of a project to avoid issues as an application scales and evolves.
With regards to Django, you're most likely going to want to optimize your application to serve more concurrent users, use fewer system resources, use the database more efficiently, take care of long-running tasks, and leverage caching. While optimization is great, making everything as performant as possible might not be the best idea. The key to developing high-quality software is finding the right balance between performance, readability, maintainability, cost, and various other factors.
This article addresses different aspects of Django performance and showcases some of the practices you can use to speed up your app. On top of that, it provides links to additional resources for further reading.
Benchmarking and Profiling
Before attempting any sort of optimization, you should always establish benchmarks. After you're done optimizing you can then benchmark again and compare it to the previous, non-optimized benchmark result. This will enable you to see if your code runs any faster or is more resource efficient.
Benchmarking can also help you figure out what parts of your code could be improved. After you're aware of the general improvement areas, you can then use a profiling tool to figure out exactly which lines you could optimize.
If you wish to easily benchmark your Django application, your first go-to should be Django Debug Toolbar. For more advanced users that require additional profiling information, there's also Silk and line_profiler.
You can also use Locust or a similar tool to load test your web app by swarming it with millions of simultaneous users to help surface bottlenecks.
Caching is one of the most effective and easiest ways to boost your application's performance.
Caching is a technique that allows you to store the result of an expensive operation in a way that can be quickly retrieved later. When it comes to dynamic web apps, every time a user visits your site a bunch of stuff happens in the background -- e.g., data gathering (from the database and third-party APIs), data processing, applying business logic, and so forth. All of this work can result in significant overhead.
This overhead isn't a big problem for most small to mid-sized web apps, but as your app grows, you'll want to consider caching to cut computation and serving times, relieve the database from high load, and serve more concurrent users.
Django has a robust, easy-to-use, built-in cache system. It supports the following cache types:
- Database caching
- File system caching
- Local memory caching
- Dummy caching
The main difference between these cache types is where the cache is stored. On top of that, Django provides different levels of cache granularity including per-site cache, per-view cache, template fragment cache, and low-level cache API.
To learn more about caching, take a look at:
- Caching in Django
- Low-Level Cache API in Django
- Django's cache framework (official docs)
In summary, leverage caching to help smooth some of the bottlenecks in your applications. Just make sure you also address some of the underlying causes of the bottlenecks, like refactoring inefficient database queries and moving data processing to an asynchronous task queue.
When used correctly, the Django ORM allows you to relieve the database from high load, minimize database queries, and make your web app much faster. Database optimization is especially crucial for highly dynamic apps that can't utilize caching as much.
The key to database optimization is understanding how Django's QuerySet works. It has two important properties you should remember:
QuerySets are lazy. The act of creating them or adding filters to them doesn't involve any database activity. They're only evaluated when you perform specific actions on them such as iteration, checking their length, testing them in a boolean context, and so on.
- Django caches
QuerySets to minimize database access. Make sure to check out the documentation to understand when queries are and are not cached.
Django provides a helpful QuerySet.explain() method that you can use to better understand how specific
QuerySets are executed by your database.
Here are some tips you can use for efficient
- Retrieve everything you need at once, and only what you need.
- Limit your
QuerySetto the number of results you need.
- Do not order your
QuerySetif you don't care about the order.
- Creating, updating, and deleting in bulk is more efficient than doing it separately.
- Querying in loops is usually bad practice. Rethink your query.
- Raw SQL should be avoided whenever possible unless you have a valid reason not to.
- Use an iterator() if you don't want your query results to be cached in memory.
A common performance pitfall that can be easily introduced when using Django are N+1 queries. They occur when you select records from an associated table using an individual query for each record rather than grabbing all records in a single query. To avoid N+1 queries, you should utilize select_related() and prefetch_related(). On top of that you can use nplusone to automatically detect N+1 queries.
To ensure optimal database performance, you should use standard database optimization techniques such as database indexing and using appropriate database types. Additionally, make sure that the database work is performed in the database -- i.e., use F() expressions instead of relying on vanilla Python code.
If you're running a highly dynamic web app that has a lot of traffic, you might want to look into persistent database connections. By default, Django reopens a database connection with each request. Opening and closing database connections can cause overhead and make your app slower. Consider setting CONN_MAX_AGE to enable connection persistence.
Check out the following articles to learn more about database optimization and see how the above-mentioned tips are used in code:
- Django Performance Improvements - Part 1: Database Optimizations
- Automating Performance Testing in Django
- A Guide to Performance Testing and Optimization with Django
- Database access optimization (official docs)
- Pagination in Django
In summary, before you start doing any sort of database optimization, you should have a solid understanding of how Django's
QuerySet works. Make sure to minimize the number of database queries (only get what you need from the database, use
select_related(), and leverage nplusone). Add database indexes as necessary.
Django provides an app for handling static files called staticfiles, which is enabled by default. To handle static files in production, typically two steps are involved: copying the static files to a designated store, and then serving them from there.
Even though Django is a powerful framework, it isn't appropriate for serving static files by itself. Why? Static file serving can place a significant burden on the server, leading to slower response times and increased server load.
For production environments, there are two serving options you can choose from:
- Django with Whitenoise
- Django with a cloud object store like AWS S3, GCP Cloud Storage, Azure Storage, or DigitalOcean Spaces
The first approach allows you to transform your web application into a self-contained unit that can be easily deployed anywhere without relying on any third-party services. It's a convenient option since it doesn't require much configuration, is free, and is relatively performant. It uses gzip and Brotli compression.
The second approach, on the other hand, allows you to offload a lot of work to the cloud provider. Additionally, you'll be able to treat your servers as ephemeral, so that you can destroy and rebuild them at any time without worrying about data loss. Your machines will also require much less space since they won't include static files.
If you're running a simple site with relatively low traffic you can get away with using WhiteNoise. If that isn't the case, I'd suggest you go with cloud storage since it will be less of a hassle to manage, is more performant, and allows you to easily scale in the future.
On top of the mentioned approaches, you can also utilize a content delivery network (CDN) to make your web app even faster. CDNs are a specialized, cost-effective way of serving static files from multiple regions around the globe.
To learn more about Django static and media files, take a look at:
- Working with Static and Media Files in Django
- Storing Django Static and Media Files on Amazon S3
- Storing Django Static and Media Files on DigitalOcean Spaces
- How to deploy static files (official docs)
In summary, compress your static files and either leverage WhiteNoise or serve them from a cloud object store. Use a CDN.
If your application's workflow involves long-running processes, you should handle them asynchronously instead of blocking the request/response cycle.
Suppose that your web app sends a confirmation email after the user signs up. If you send the email directly in your request/response cycle the user will have to wait for the email to be sent before receiving a response from the server. This can result in a poor user experience and slow server responses.
A few other examples of such tasks are:
- Data analyzing
- Image/text processing
- Running ML models
- Report generation
- Third-party API calls
While Django does support async views, the best way to offload these tasks from the request/response cycle is to use a distributed task queue. This way the response will be returned immediately while a separate worker process takes care of the long-running task.
There are multiple open-source task queues you can choose from. The most popular is definitely Celery, but there's also huey, Django Q, and Django-RQ. The task queues also allow you to schedule tasks. Task scheduling can be used for generating reports, performing backups, sending bulk emails, and so on.
To learn more about the request/response cycle and using Celery with Django take a look at:
In summary, find those views that have long-running processes in them and move them to a task queue for asynchronous processing.
Throughout the article, we've explained what performance is, how it can be measured, and looked at different aspects of Django performance. You should now have a general idea of how to benchmark your app, identify weak spots, and optimize them.
I highly recommend you check out the linked resources to get a better insight into the mentioned topics and see how the optimization tips can be applied in practice.
Keep in mind that while performance is an important characteristic of software, it's not necessarily the most critical one. Optimization can be extremely time and resource-consuming. I recommend you focus on your software's core functionality while keeping performance in mind. You can optimize more when the need arises.
At some point though, if your app keeps growing in size you'll reach the law of diminishing returns with optimization and vertical scaling. When that happens the only way to accommodate the growing traffic will be to scale horizontally by spinning up more machines with Django apps.
Finally, if you haven't already, check out Performance and optimization from the official Django docs.