Testing in Python

Last updated October 23rd, 2020

Automated testing has always been a hot topic in software development, but in the era of continuous integration and microservices, it's talked about even more. There are many tools that can help us write, run, and evaluate our tests in our Python projects. Let's take a look at a few of them.



While the Python standard library comes with a unit testing framework called ├╣nittest, pytest is the go-to testing framework for testing Python code.

pytest makes it easy (and fun!) to write, organize, and run tests. When compared to unittest, from the Python standard library, pytest:

  1. Requires less boilerplate code so your test suites will be more readable.
  2. Supports the plain assert statement, which is far more readable and easier to remember compared to the assertSomething methods -- like assertEquals, assertTrue, and assertContains -- in unittest.
  3. Is updated more frequently since it's not part of the Python standard library.
  4. Simplifies setting up and tearing down test state with its fixture system.
  5. Uses a functional approach.

Plus, with pytest, you can have a consistent style across all of your Python projects. Say, you have two web applications in your stack -- one built with Django and the other built with Flask. Without pytest, you'd most likely leverage the Django test framework along with a Flask extension like Flask-Testing. So, your test suites would have different styles. With pytest, on the other hand, both test suites would have a consistent code style, making it easier to jump from one to the other.

pytest also has a large, community-maintained plugin ecosystem.

Some examples:

  • pytest-django - provides a set of tools made specifically for testing Django applications
  • pytest-xdist - is used to run tests in parallel
  • pytest-cov - adds code coverage support
  • pytest-instafail - shows failures and errors immediately instead of waiting until the end of a run


Automated tests should be fast, isolated/independent, and deterministic/repeatable. Thus, if you need to test code that makes an external HTTP request to a third-party API, you should really mock the request. Why? if you don't, then that specific test will be-

  1. slow since it's making an HTTP request over the network
  2. dependent on the third-party service and the speed of the network itself
  3. non-deterministic since the test could yield a different result based on the response from the API

It's also a good idea to mock other long running operations, like database queries and async tasks, since automated tests are generally run frequently, on every commit pushed to source control.

Mocking is the practice of replacing real objects with mocked objects, which mimic their behavior, at runtime. So, instead of a sending a real HTTP request over the network, we just return an expected response when the mocked method is called.

For example:

import requests

def get_my_ip():
    response = requests.get(
    return response.json()['ip']

def test_get_my_ip(monkeypatch):
    my_ip = ''

    class MockResponse:

        def __init__(self, json_body):
            self.json_body = json_body

        def json(self):
            return self.json_body

        lambda *args, **kwargs: MockResponse({'ip': my_ip})

    assert get_my_ip() == my_ip

What's happening here?

We used pytest's monkeypatch fixture to replace all calls to the get method from the requests module with the lambda callback that always returns an instance of MockedResponse.

We used an object because requests returns a Response object.

We can simplify the tests with the create_autospec method from the unittest.mock module. This method creates a mock object with the same properties and methods as the object passed as a parameter:

from unittest import mock

import requests
from requests import Response

def get_my_ip():
    response = requests.get(
    return response.json()['ip']

def test_get_my_ip(monkeypatch):
    my_ip = ''
    response = mock.create_autospec(Response)
    response.json.return_value = {'ip': my_ip}

        lambda *args, **kwargs: response

    assert get_my_ip() == my_ip

Although pytest recommends the monkeypatch approach for mocking, the pytest-mock extension and the vanilla unittest.mock library from the standard library are fine approaches as well.

Code Coverage

Another important aspect of tests is code coverage. It's a metric which tells you the ratio between the number of lines executed during test runs and the total number of all lines in your code base. We can use the pytest-cov plugin for this, which integrates Coverage.py with pytest.

Once installed, to run tests with coverage reporting, add the --cov option like so:

$ python -m pytest --cov=.

It will produce output like so:

================================== test session starts ==================================
platform linux -- Python 3.7.9, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/johndoe/sample-project
plugins: cov-2.10.1
collected 6 items

tests/test_sample_project.py ....                                             [ 66%]
tests/test_sample_project_mock.py .                                           [ 83%]
tests/test_sample_project_mock_1.py .                                         [100%]

----------- coverage: platform linux, python 3.7.9-final-0 -----------
Name                                  Stmts   Miss  Cover
sample_project/__init__.py                1      1     0%
tests/__init__.py                         0      0   100%
tests/test_sample_project.py              5      0   100%
tests/test_sample_project_mock.py        13      0   100%
tests/test_sample_project_mock_1.py      12      0   100%
TOTAL                                    31      1    97%

==================================  6 passed in 0.13s ==================================

For every file in the project's path you get:

  • Stmts - number of lines of code
  • Miss - number of lines that weren't executed by the tests
  • Cover - coverage percentage for the file

At the bottom, there's a line with the totals for the whole project.

Keep in mind that although it's encouraged to achieve a high coverage percentage, that doesn't mean your tests are good tests, testing each of the happy and exception paths of your code. For example, tests with assertions like assert sum(3, 2) == 5 can achieve high coverage percentage but your code is still practically untested since exception paths are not being covered.


Hypothesis is a library for conducting property-based testing in Python. Rather than having to write different test cases for every argument you want to test, property-based testing generates a wide-range of random test data that's dependent on previous tests runs. This helps increase the robustness of your test suite while decreasing test redundancy. In short, your test code will be cleaner, more DRY, and overall more efficient while still covering a wide range of test data.

For example, say you have to write tests for the following function:

def increment(num: int) -> int:
    return num + 1

You could write the following test:

import pytest

    'number, result',
        (-2, -1),
        (0, 1),
        (3, 4),
        (101234, 101235),
def test_increment(number, result):
    assert increment(number) == result

There's nothing wrong with this approach. Your code is tested and code coverage is high (100% to be exact). That said, how well is your code tested based on the range of possible inputs? There are quite a lot of integers that could be tested, but only four of them are being used in the test. In some situations this is enough. In other situations four cases isn't enough -- i.e., non-deterministic machine learning code. What about really small or large numbers? Or say your function takes a list of integers rather than a single integer -- What if the list was empty or it contained one element, hundreds of elements, or thousands of elements? In some situations we simply cannot provide (let alone even think of) all the possible cases. That's where property-base testing comes into play.

Machine learning algorithms are a great use case for property-based testing since it's difficult to produce (and maintain) test examples for complex sets of data.

Frameworks like Hypothesis provide recipes (called Strategies) for generating random test data. Hypothesis also stores the results of previous test runs and uses them to create new cases.

Strategies are algorithms that generate pseudo-random data based on the shape of the input data. It's pseudo-random because the generated data is based on data from previous tests.

The same test using property-based testing via Hypothesis looks like this:

from hypothesis import given
import hypothesis.strategies as st

def test_add_one(num):
    assert increment(num) == num - 1

st.integers() is a Hypothesis Strategy that generates random integers for testing while the @given decorator is used to parameterize the test function. So when the test function is called, the generated integers, from the Strategy, will be passed into the test.

$ python -m pytest test_hypothesis.py --hypothesis-show-statistics

================================== test session starts ===================================
platform darwin -- Python 3.8.5, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/johndoe/sample-project
plugins: hypothesis-5.37.3
collected 1 item

test_hypothesis.py .                                                               [100%]
================================= Hypothesis Statistics ==================================


  - during generate phase (0.06 seconds):
    - Typical runtimes: < 1ms, ~ 50% in data generation
    - 100 passing examples, 0 failing examples, 0 invalid examples

  - Stopped because settings.max_examples=100

=================================== 1 passed in 0.08s ====================================

Type Checking

Tests are code, and they should be treated as such. Like your business code, you need to maintain and refactor them. You may even have to deal with bugs from time to time. Because of this, it's a good practice to keep your tests short, simple, and straight to the point. You should also take care not to over test your code.

Runtime (or dynamic) type checkers, like Typeguard and pydantic, can help to minimize the number of tests. Let's take a look at an example of this with pydantic.

For example, let's say we have a User that has a single attribute, an email address:

class User:

    def __init__(self, email: str):
        self.email = email

user = User(email='[email protected]')

We want to be sure that the provided email is really a valid email address. So, to validate it, we'll have to add some helper code somewhere. Along with writing a test, we'll also have to spend time writing the regex for this. pydantic can help with this. We can use it to define our User model:

from pydantic import BaseModel, EmailStr

class User(BaseModel):
    email: EmailStr

user = User(email='[email protected]')

Now, the email argument will be validated by pydantic before every new User instance is created. When it's not a valid email -- i.e., User(email='something') -- a ValidationError will be raised. This eliminates the need to write our own validator. We also don't need to test it since the maintainers of pydantic handle that for us.

We can reduce the number of tests for any user provided data. And, instead, we just need to test that a ValidationError is handled correctly.

Let's look at a quick example in a Flask app:

import uuid

from flask import Flask, jsonify
from pydantic import ValidationError, BaseModel, EmailStr, Field

app = Flask(__name__)

def handle_validation_exception(error):
    response = jsonify(error.errors())
    response.status_code = 400
    return response

class Blog(BaseModel):
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    author: EmailStr
    title: str
    content: str


import json

def test_create_blog_bad_request(client):
    GIVEN request data with invalid values or missing attributes
    WHEN endpoint /create-blog/ is called
    THEN it should return status 400 and JSON body
    response = client.post(
            'author': 'John Doe',
            'title': None,
            'content': 'Some extra awesome content'

    assert response.status_code == 400
    assert response.json is not None


Testing can often feel like a daunting task. There are always times that it can be, but hopefully this article provided some tools that you can use to make testing easier. Focus your testing efforts on decreasing flakey tests. Your tests should also be fast, isolated/independent, and deterministic/repeatable. In the end, having confidence in your test suite will help you deploy to production more often and, more importantly, help you sleep at night.

Happy testing!

Jan Giacomelli

Jan Giacomelli

Jan is a software engineer who lives in Ljubljana, Slovenia, Europe. He is co-founder of typless where he is leading engineering efforts. He loves working with Python and Django. When he's not writing code or deploying to AWS, he's probably skiing, windsurfing, or playing guitar.

Share this tutorial

Featured Course

Test-Driven Development with Python, Flask, and Docker

In this course, you'll learn how to set up a development environment with Docker in order to build and deploy a microservice powered by Python and Flask. You'll also apply the practices of Test-Driven Development with Pytest as you develop a RESTful API.

Featured Course

Test-Driven Development with Python, Flask, and Docker

In this course, you'll learn how to set up a development environment with Docker in order to build and deploy a microservice powered by Python and Flask. You'll also apply the practices of Test-Driven Development with Pytest as you develop a RESTful API.