Josh Lewis

The basics of optimising Django for performance

Although Django is pretty fast out the box, complex lookups on the database can get slow. Fortunately there are some pretty simple rules and tools to keep things snappy.

In this article I look at how to understand and minimise Django queries.

Investigating Queries

Using connection.queries

One of the most important thing to get used to using when developing with Django is the django shell. This allows you to execute code without having to run the site and allows you test things quickly and easily. You access the shell by executing:

> python manage.py shell

from within the django directory. You can then do stuff like the following:

> python manage.py shell
(InteractiveConsole)
>>> from blogproject.models import Article
>>> Article.objects.all()
[<Article: N Korea in foreign embassy warning>, <Article: Pressure mounts on HBOS bankers>]

just as you could in a view. The built in tool for investing queries is Django's connection.queries, which can be used in a view or from with in the terminal like so:

>>> from django.db import connection
>>> Article.objects.all()
[<Article: N Korea in foreign embassy warning>, <Article: Pressure mounts on HBOS bankers>]
>>> connection.queries
[{'time': '0.02', 'sql': 'SELECT `blogproject_article`.`id`, `blogproject_article`.`name` FROM
`blogproject_article`'}]

This returns every query executed by your code and is useful for looking at what queries are produced. Note that your project settings.py must have DEBUG = True in order for this to work.

Using Django Debug Toolbar

The Django Debug Toolbar is very useful for checking every page for a site to see what queries are executed, and is much quicker than using the command line. Once installed you just need to add 'debug_toolbar' to your INSTALLED_APPS in settings.py, make sure your settings file has a 'DATABASE_ENGINE' variable set and is in DEBUG mode and then set a 'INTERNAL_IPS = ('127.0.0.1',)' variable.

Once installed visiting any page on your site will give you a toolbar which can display information such as the queries executed (see screenshot above). It even intercepts redirects so you investigate everything that goes on, but this can be customised.

Django Queries

Using these queries we can now investigate how the Django ORM works. We start by looking at when queries are executed:

When Queries are made

>>> from django.db import connection
>>> from blogproject.models import Article
>>> x = Article.objects.all()
>>> connection.queries
[ ]

Wait, no queries? The first thing to understand is that QuerySets are lazy - that is the database is only hit when the QuerySet is evaluated. It is for this reason, only one query is executed here, not two:

>>> x = Article.objects.all()
>>> x = x.filter(name='N Korea in foreign embassy warning')
>>> connection.queries
[ ]
>>> print x
[<Article: N Korea in foreign embassy warning>,]
>>> connection.queries
[{'time': '0.02', 'sql': 'SELECT `blogproject_article`.`id`, `blogproject_article`.`name` FROM
`blogproject_article` WHERE `blogproject_article`.`name` = `N Korea in foreign embassy warning`'}]

In general, QuerySets are evaluated when they are iterated through, sliced, are used in as a Boolean variable in an if statement, or len() or list() is called upon them.

QuerySet Caching

Each QuerySet in Django also has its own cache so using the same QuerySet twice only causes the database to be hit once. If you call the same QuerySet twice, it makes sense to save it to a variable first.

Optimising Queries

Now that we have looked a bit at the way in which Django creates queries we can look at some of the scenarios where creating excessive queries can be prevented. This happens because when you create a view or other function using the Django ORM, you are abstracted from the raw SQL which makes it quicker and easier to work with but can also make it easy to unintentionally hit the database an excessive amount.

Using select_related()

This can often make a big difference in practical cases and is just results in JOINs in the SQL. For example, if you had an Article model linked to Publication model by a ForiegnKey and did something like this in a view:

for article in Article.objects.all():
            print article.publication

This will execute one query to fetch all the Article objects, and then one query for each article to fetch the publication. If you had a 1000 articles, then this would have just done 1001 queries which would clearly be bad for your database if done, say, every time some loaded the home page.

for article in Article.objects.all().select_related('publication'):
            print article.publication

The above will use an INNER JOIN, to get the publication and article in one big query. Although these result in more complicated queries you will significantly reduce the number of times the database is accessed which is generally better. If in doubt, test it to see which is faster.

If you need to access something that is a child of a child of an element, such as the userprofile of a user which is foriegnkey linked to your article than use:

.select_related('user','user__profile')

by setting the related_name property on the userprofile like so:

user = models.OneToOneField(User, related_name='profile')

Query calls in Templates

Related to this is the fact that templates have the power to make database queries too. They are harder to spot as you do not use parentheses but will still call callables automatically. For example if you Article model has a manytomany relationship with a Tag model then something like this:

{%  for article in articles  %}
    <h3>{{ article.name }}</h3>
    <div class="tags">
        {% for tag in article.tags.all %}
            {{ tag.name }}
        {% endfor  %}
    </div>
{%  endfor %}

in a template will execute a query for every article to find the related tags. This is because article.tags.all is actually calling article.tags.all() which performs a query.

Fetch everything at once

If you will need the whole QuerySet, then generally it is more efficient to retrieve everything at once at the beginning and then perform filters and sorts on that instead of performing multiple smaller queries.

Using exists() and count()

In general it is more efficient to use ORM methods where possible, unless the whole queryset will be needed anyway. The two most common examples are checking if a result exists using exists() and the number of items using count() like so:

>>> Article.objects.filter(name='N Korea in foreign embassy warning').exists()
True
>>> Article.objects.all().count()
2

Both of these are faster than executing a whole QuerySet and checking if a result is in it, or evaluating len().

Using Raw SQL

There will be cases in anything other than the most simplistic setups when the ORM is not enough and it will be necessary to use raw SQL. Django gives a few options, firstly it can be executed on the manager like so:

Article.objects.raw('SELECT * FROM blogproject_article')

This has the advantage that it returns article objects just like a normal QuerySet. Alternatively, you can create your own SQL using Django's connection.cursor:

cursor = connection.cursor()
cursor.execute("UPDATE blogproject_article SET name = 'New Name' WHERE name = %s", [old_name])
row = cursor.fetchone()

However, the normal QuerySet will likely be sufficient for most cases.

Summary

Making use of everything in this article gets you over 90% of the way there with regards to optimising queries in Django. The most important thing is to understand how queries work, and to profile. Using the Django Debug Toolbar is incredibly helpful for quickly checking what queries are being executed in each view, and reducing the number is generally quite straight forward once you understand why it is happening.