Basic and Full-text Search with Django and Postgres

Last updated June 17th, 2021

Unlike relational databases, full-text search is not standardized. There are a number of open-source options like ElasticSearch, Solr, and Xapian. ElasticSearch is probably the most popular solution; however, it's complicated to set up and maintain. Further, if you're not taking advantage of some of the advanced features that ElasticSearch offers, you should stick with the full-text search capabilities that many relational (like Postgres, MySQL, SQLite) and non-relational databases (like MongoDB and CouchDB) offer. Postgres in particular is well-suited for full-text search. Django supports it out-of-the-box as well.

For the vast majority of your Django apps, you should, at the very least, start out with leveraging full-text search from Postgres before looking to a more powerful solution like ElasticSearch or Solr.

In this article, we'll add basic and full-text search to a Django app with Postgres.

Contents

Objectives

By the end of this article, you should be able to:

  1. Set up basic search functionality in a Django app with the Q object
  2. Add full-text search to a Django app
  3. Sort full-text search results by relevance using stemming, ranking, and weighting

Project Setup and Overview

Clone down the base branch from the django-search repo:

$ git clone https://github.com/testdrivenio/django-search --branch base --single-branch
$ cd django-search

We'll use Docker to simplify setting up and running Postgres along with Django.

From the project root, create the images and spin up the Docker containers:

$ docker-compose up -d --build

Next, apply the migrations and create a superuser:

$ docker-compose exec web python manage.py migrate
$ docker-compose exec web python manage.py createsuperuser

Once done, navigate to http://127.0.0.1:8011/quotes/ to ensure the app works as expected. You should see the following:

Quote Home Page

Want to learn how to work with Django and Postgres? Check out the Dockerizing Django with Postgres, Gunicorn, and Nginx article.

Take note of the Quote model in quotes/models.py:

from django.db import models

class Quote(models.Model):
    name = models.CharField(max_length=250)
    quote = models.TextField(max_length=1000)

    def __str__(self):
        return self.quote

Run the following management command to add 10,000 quotes to the database:

$ docker-compose exec web python manage.py add_quotes

This will take a couple of minutes to run. Once done, navigate to http://127.0.0.1:8011/quotes/ to see the data:

Quote Home Page

In the quote/templates/quote.html file, we have a basic form with a search input field:

<form action="{% url 'search_results' %}" method="get">
  <input type="search" name="q" placeholder="Search by name or quote..." class="form-control">
</form>

On submit, the search form sends a GET request rather than a POST so we have access to the query string both in the URL and in the Django view. Having the query string appear in the URL enables us to be able to share it with others as a link.

Take a quick look at the project structure and the rest of the code before moving on.

We'll start our search journey off by taking a look at Q objects, which allow us to search words using the AND (&) or OR (|) logical operators.

For instance, to use an OR operator, override the SearchResultsList's default QuerySet in quote/views.py like so:

class SearchResultsList(ListView):
    model = Quote
    context_object_name = "quotes"
    template_name = "search.html"

    def get_queryset(self):
        query = self.request.GET.get("q")
        return Quote.objects.filter(
            Q(name__icontains=query) | Q(quote__icontains=query)
        )

Here, we used the filter method to filter against the name or quote fields. We also used icontains to check if the word is contained in the field (case insensitive). If it's contained, the field will be returned.

Don't forget the import:

from django.db.models import Q

Try it out:

Search Page

For small data sets, this is a great way to add basic search functionality to your app. Once your data sets become large and the contents that you are searching against are also many, you'll then want to look at adding full-text search.

The basic search that we saw earlier has some issues especially when we consider matching against large data sets.

The first issue is that of stop words. Examples of such words are "a", "an" and "the". These words are common and insufficiently meaningful, so they should be ignored. To test, try searching for a word with "the" in front of it. Say you searched for "the middle". In this case, you'll only see results for "the middle", so you won't see any results that have the word "middle" without "the" before it.

Say you have these two sentences:

  1. I am in the middle.
  2. You don't like middle school.

You'll get the following returned with each type of search:

Query Basic Search Full-text Search
"the middle" Sentence 1 Sentences 1 and 2
"middle" Sentences 1 and 2 Sentences 1 and 2

Another issue is that of similar words. With the basic search, only exact matches are returned. This is quite limited. With full-text search, we can take similar words into account. To test, try to find some similar words like "pony" and "ponies". With a basic search, if you search for "pony" you won't see results that contain "ponies" -- and vice versa.

Say you have these two sentences.

  1. I am a pony.
  2. You don't like ponies

You'll get the following returned with each type of search:

Query Basic Search Full-text Search
"pony" Sentence 1 Sentences 1 and 2
"ponies" Sentence 2 Sentences 1 and 2

With full-text search, both of these issues are mitigated. Keep in mind that depending on your goal for the search, full-text search may actually decrease precision (quality) and recall (quantity of relevant results). Typically, full-text search is less precise than basic search, since basic search yields exact matches to the search query. That said, if you're searching through large data sets with large blocks of text, full-text search is preferred since it's usually much faster.

Full-text search is an advanced searching technique that examines all the words in every stored document as it tries to match the search criteria. In full-text search, stop words such as "a", "and", and "the" are ignored because they are both common and insufficiently meaningful. In addition, with full-text search, we can employ language-specific stemming on the words being indexed. For example, the word "drives", "drove", and "driven" will be recorded under the single concept word "drive". Stemming is the process of reducing words to their word stem, base, or root form.

It suffices to say that full-text search is not perfect and one of the issues is that it is likely to retrieve many documents that are not relevant (false positives) to the intended search question. However, there are some techniques based on Bayesian algorithms that can help reduce such problems.

To take advantage of Postgres full-text search with Django, add django.contrib.postgres to your INSTALLED_APPS list:

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    "quotes.apps.QuotesConfig",
    "debug_toolbar",
    "django.contrib.postgres",  # new
]

Next, let's look at two quick examples of full text search, on a single field and on multiple fields.

Update SearchResultsList like so:

class SearchResultsList(ListView):
    model = Quote
    context_object_name = "quotes"
    template_name = "search.html"

    def get_queryset(self):
        query = self.request.GET.get("q")
        return Quote.objects.filter(quote__search=query)

Here, we're only searching the quote field.

Search Page

As you can see, it takes similar words into account. In the above example, "ponies" and "pony" are treated as similar words.

To filter on a combination of fields and on related models, you can use the SearchVector class.

Again, update SearchResultsList:

class SearchResultsList(ListView):
    model = Quote
    context_object_name = "quotes"
    template_name = "search.html"

    def get_queryset(self):
        query = self.request.GET.get("q")
        return Quote.objects.annotate(search=SearchVector("name", "quote")).filter(
            search=query
        )

To search against multiple fields, we had to annotate the queryset using a SearchVector.

Make sure to add the import:

from django.contrib.postgres.search import SearchVector

Try some searches out.

Stemming and Ranking

In this section, we'll combine several methods such as SearchVector, SearchQuery, and SearchRank to produce a very robust search that uses both stemming and ranking.

Again, stemming is the process of reducing words to their word stem, base, or root form. With stemming, words like child and children will be considered similar words. Ranking, on the other hand, allows us to order results by relevancy.

Update SearchResultsList:

class SearchResultsList(ListView):
    model = Quote
    context_object_name = "quotes"
    template_name = "search.html"

    def get_queryset(self):
        query = self.request.GET.get("q")
        search_vector = SearchVector("name", "quote")
        search_query = SearchQuery(query)
        return (
            Quote.objects.annotate(
                search=search_vector, rank=SearchRank(search_vector, search_query)
            )
            .filter(search=search_query)
            .order_by("-rank")
        )

What's happening here?

  1. SearchVector - again we used a search vector to search against multiple fields.
  2. SearchQuery - translates the words provided to us as a query from the form and passes it through a stemming algorithm and then it looks for matches for all of the resulting terms.
  3. SearchRank - allows us to order the results by relevancy. It takes into account how often the query terms appear in the document, how close the terms are on the document, and how important the part of the document is where they occur.

Add the imports:

from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank

Search Page

Compare the results from the basic search to that of the full-text search. There's a clear difference. In the full-text search, the query with the highest results are shown first. This is the power of the SearchRank. Combining SearchVector, SearchQuery, and SearchRank is quick way to produce a much more powerful and precise search than the basic search.

Adding Weights

Full-text search gives us the ability to add more importance to some fields in our table in the database over other fields. We can do this by adding weight to our queries.

The weight should be one of the following letters: D, C, B, A. By default, these weights refer to the numbers 0.1, 0.2, 0.4, and 1.0, respectively.

class SearchResultsList(ListView):
    model = Quote
    context_object_name = "quotes"
    template_name = "search.html"

    def get_queryset(self):
        query = self.request.GET.get("q")
        search_vector = SearchVector("name", weight="B") + SearchVector(
            "quote", weight="A"
        )
        search_query = SearchQuery(query)
        return (
            Quote.objects.annotate(rank=SearchRank(search_vector, search_query))
            .filter(rank__gte=0.3)
            .order_by("-rank")
        )

Here, we added weights to the SearchVector using both the name and quote fields. We applied weights of 0.4 and 1.0 to the name and quote fields, respectively. Therefore, quote matches will prevail over name content matches. We then filtered the results to display only the ones that are greater than 0.3.

Conclusion

Although full-text search is fast, it can become less performant when searching more than a few hundred records because of its intensive process in searching the document. To mitigate this, you can create a functional index that matches the search vector you wish to use. This approach should only be considered when you start noticing some performance deficiency. For more, review the Performance section from Django's Full-text Search docs.

In this article, we guided you through setting up a basic search feature for a Django app and then took it up a notch to a full-text search using the Postgres module from Django.

Grab the complete code from the django-search repo.

Featured Course

Full-text Search in Django with Postgres and Elasticsearch

Learn how to add full-text search to Django with both Postgres and Elasticsearch.

Featured Course

Full-text Search in Django with Postgres and Elasticsearch

Learn how to add full-text search to Django with both Postgres and Elasticsearch.