Testing in Python

Automated testing has always been a hot topic in software development, but in the era of continuous integration and microservices, it's talked about even more. There are many tools that can help you write, run, and evaluate your tests in your Python projects. Let's take a look at a few of them.

This article is part of the Complete Python guide:

Modern Python Environments - dependency and workspace management

Testing in Python (this article!)

Modern Test-Driven Development in Python

Python Code Quality

Python Type Checking

Documenting Python Code and Projects

Python Project Workflow

pytest
Mocking
Code Coverage
Mutation Testing
Hypothesis
Type Checking
Conclusion

pytest

While the Python standard library comes with a unit testing framework called ùnittest, pytest is the go-to testing framework for testing Python code.

pytest makes it easy (and fun!) to write, organize, and run tests. When compared to unittest, from the Python standard library, pytest:

Requires less boilerplate code so your test suites will be more readable.
Supports the plain assert statement, which is far more readable and easier to remember compared to the assertSomething methods -- like assertEquals, assertTrue, and assertContains -- in unittest.
Is updated more frequently since it's not part of the Python standard library.
Simplifies setting up and tearing down test state with its fixture system.
Uses a functional approach.

Plus, with pytest, you can have a consistent style across all of your Python projects. Say, you have two web applications in your stack -- one built with Django and the other built with Flask. Without pytest, you'd most likely leverage the Django test framework along with a Flask extension like Flask-Testing. So, your test suites would have different styles. With pytest, on the other hand, both test suites would have a consistent style, making it easier to jump from one to the other.

pytest also has a large, community-maintained plugin ecosystem.

Some examples:

pytest-django - provides a set of tools made specifically for testing Django applications
pytest-xdist - is used to run tests in parallel
pytest-cov - adds code coverage support
pytest-instafail - shows failures and errors immediately instead of waiting until the end of a run

For a full list of plugins, check out Plugin List from the docs.

Mocking

Automated tests should be fast, isolated/independent, and deterministic/repeatable. Thus, if you need to test code that makes an external HTTP request to a third-party API, you should really mock the request. Why? If you don't, then that specific test will be-

slow since it's making an HTTP request over the network
dependent on the third-party service and the speed of the network itself
non-deterministic since the test could yield a different result based on the response from the API

It's also a good idea to mock other long running operations, like database queries and async tasks, since automated tests are generally run frequently, on every commit pushed to source control.

Mocking is the practice of replacing real objects with mocked objects, which mimic their behavior, at runtime. So, instead of a sending a real HTTP request over the network, we just return an expected response when the mocked method is called.

For example:

import requests


def get_my_ip():
    response = requests.get(
        'http://ipinfo.io/json'
    )
    return response.json()['ip']


def test_get_my_ip(monkeypatch):
    my_ip = '123.123.123.123'

    class MockResponse:

        def __init__(self, json_body):
            self.json_body = json_body

        def json(self):
            return self.json_body

    monkeypatch.setattr(
        requests,
        'get',
        lambda *args, **kwargs: MockResponse({'ip': my_ip})
    )

    assert get_my_ip() == my_ip

What's happening here?

We used pytest's monkeypatch fixture to replace all calls to the get method from the requests module with the lambda callback that always returns an instance of MockedResponse.

We used an object because requests returns a Response object.

We can simplify the tests with the create_autospec method from the unittest.mock module. This method creates a mock object with the same properties and methods as the object passed as a parameter:

from unittest import mock

import requests
from requests import Response


def get_my_ip():
    response = requests.get(
        'http://ipinfo.io/json'
    )
    return response.json()['ip']


def test_get_my_ip(monkeypatch):
    my_ip = '123.123.123.123'
    response = mock.create_autospec(Response)
    response.json.return_value = {'ip': my_ip}

    monkeypatch.setattr(
        requests,
        'get',
        lambda *args, **kwargs: response
    )

    assert get_my_ip() == my_ip

Although pytest recommends the monkeypatch approach for mocking, the pytest-mock extension and the vanilla unittest.mock library from the standard library are fine approaches as well.

Code Coverage

Another important aspect of tests is code coverage. It's a metric which tells you the ratio between the number of lines executed during test runs and the total number of all lines in your code base. We can use the pytest-cov plugin for this, which integrates Coverage.py with pytest.

Once installed, to run tests with coverage reporting, add the --cov option like so:

$ python -m pytest --cov=.

It will produce output like so:

================================== test session starts ==================================
platform linux -- Python 3.7.9, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/johndoe/sample-project
plugins: cov-2.10.1
collected 6 items

tests/test_sample_project.py ....                                             [ 66%]
tests/test_sample_project_mock.py .                                           [ 83%]
tests/test_sample_project_mock_1.py .                                         [100%]

----------- coverage: platform linux, python 3.7.9-final-0 -----------
Name                                  Stmts   Miss  Cover
---------------------------------------------------------
sample_project/__init__.py                1      1     0%
tests/__init__.py                         0      0   100%
tests/test_sample_project.py              5      0   100%
tests/test_sample_project_mock.py        13      0   100%
tests/test_sample_project_mock_1.py      12      0   100%
---------------------------------------------------------
TOTAL                                    31      1    97%


==================================  6 passed in 0.13s ==================================

For every file in the project's path you get:

Stmts - number of lines of code
Miss - number of lines that weren't executed by the tests
Cover - coverage percentage for the file

At the bottom, there's a line with the totals for the whole project.

Keep in mind that although it's encouraged to achieve a high coverage percentage, that doesn't mean your tests are good tests, testing each of the happy and exception paths of your code. For example, tests with assertions like assert sum(3, 2) == 5 can achieve high coverage percentage but your code is still practically untested since exception paths are not being covered.

Mutation Testing

Mutation Testing helps ensure that your tests actually cover the full behavior of your code. Put another way, it analyzes the effectiveness or robustness of your test suite. During mutation testing, a tool iterates through each line of your source code, making small changes (called mutations) that should break your code. After each mutation, the tool runs your unit tests and checks whether your tests fail or not. If your tests still pass, then your code didn't survive the mutation test.

For example, say you have the following code:

if x > y:
    z = 50
else:
    z = 100

The mutation tool may change the operator from > to >= like so:

if x >= y:
    z = 50
else:
    z = 100

mutmut is a mutation testing library for Python. Let's look at it in action.

Say you have the following Loan class:

# loan.py

from dataclasses import dataclass
from enum import Enum


class LoanStatus(str, Enum):
    PENDING = "PENDING"
    ACCEPTED = "ACCEPTED"
    REJECTED = "REJECTED"


@dataclass
class Loan:
    amount: float
    status: LoanStatus = LoanStatus.PENDING

    def reject(self):
        self.status = LoanStatus.REJECTED

    def rejected(self):
        return self.status == LoanStatus.REJECTED

Now, let's say you want to automatically reject loan requests that are greater than 250,000:

# reject_loan.py

def reject_loan(loan):
    if loan.amount > 250_000:
        loan.reject()

    return loan

You then wrote the following test:

# test_reject_loan.py

from loan import Loan
from reject_loan import reject_loan


def test_reject_loan():
    loan = Loan(amount=100_000)

    assert not reject_loan(loan).rejected()

When you run mutation testing with mutmut, you'll see that you have two surviving mutants:

$ mutmut run --paths-to-mutate reject_loan.py --tests-dir=.

- Mutation testing starting -

These are the steps:
1. A full test suite run will be made to make sure we
   can run the tests successfully and we know how long
   it takes (to detect infinite loops for example)
2. Mutants will be generated and checked

Results are stored in .mutmut-cache.
Print found mutants with `mutmut results`.

Legend for output:
🎉 Killed mutants.   The goal is for everything to end up in this bucket.
⏰ Timeout.          Test suite took 10 times as long as the baseline so were killed.
🤔 Suspicious.       Tests took a long time, but not long enough to be fatal.
🙁 Survived.         This means your tests needs to be expanded.
🔇 Skipped.          Skipped.

1. Running tests without mutations
⠏ Running...Done

2. Checking mutants
⠸ 2/2  🎉 0  ⏰ 0  🤔 0  🙁 2  🔇 0

You can view the surviving mutants by ID:

$ mutmut show 1

--- reject_loan.py
+++ reject_loan.py
@@ -1,7 +1,7 @@
 # reject_loan.py

 def reject_loan(loan):
-    if loan.amount > 250_000:
+    if loan.amount >= 250_000:
         loan.reject()

     return loan

$ mutmut show 2

--- reject_loan.py
+++ reject_loan.py
@@ -1,7 +1,7 @@
 # reject_loan.py

 def reject_loan(loan):
-    if loan.amount > 250_000:
+    if loan.amount > 250001:
         loan.reject()

     return loan

Improve your test:

from loan import Loan
from reject_loan import reject_loan


def test_reject_loan():
    loan = Loan(amount=100_000)
    assert not reject_loan(loan).rejected()

    loan = Loan(amount=250_001)
    assert reject_loan(loan).rejected()

    loan = Loan(amount=250_000)
    assert not reject_loan(loan).rejected()

If you run mutation tests again, you'll see that no mutations survived:

$ mutmut run --paths-to-mutate reject_loan.py --tests-dir=.

- Mutation testing starting -

These are the steps:
1. A full test suite run will be made to make sure we
   can run the tests successfully and we know how long
   it takes (to detect infinite loops for example)
2. Mutants will be generated and checked

Results are stored in .mutmut-cache.
Print found mutants with `mutmut results`.

Legend for output:
🎉 Killed mutants.   The goal is for everything to end up in this bucket.
⏰ Timeout.          Test suite took 10 times as long as the baseline so were killed.
🤔 Suspicious.       Tests took a long time, but not long enough to be fatal.
🙁 Survived.         This means your tests needs to be expanded.
🔇 Skipped.          Skipped.

1. Running tests without mutations
⠏ Running...Done

2. Checking mutants
⠙ 2/2  🎉 2  ⏰ 0  🤔 0  🙁 0  🔇 0

Now your test is much more robust. Any unintentional change inside of reject_loan.py will produce a failing test.

Mutation testing tools for Python are not as mature as some of the others out there. For example, mutant is a mature mutation testing tool for Ruby. To learn more about mutation testing in general, follow the mutant author on Twitter.

As with any other approach, mutation testing comes with a tradeoff. While it improves your test suite's ability to catch bugs, it comes at the cost of speed since you have to run your entire test suite hundreds of times. It also forces you to really test everything. This can help uncover exceptions paths, but you will have many more test cases to maintain.

Hypothesis

Hypothesis is a library for conducting property-based testing in Python. Rather than having to write different test cases for every argument you want to test, property-based testing generates a wide-range of random test data that's dependent on previous tests runs. This helps increase the robustness of your test suite while decreasing test redundancy. In short, your test code will be cleaner, more DRY, and overall more efficient while still covering a wide range of test data.

For example, say you have to write tests for the following function:

def increment(num: int) -> int:
    return num + 1

You could write the following test:

import pytest


@pytest.mark.parametrize(
    'number, result',
    [
        (-2, -1),
        (0, 1),
        (3, 4),
        (101234, 101235),
    ]
)
def test_increment(number, result):
    assert increment(number) == result

There's nothing wrong with this approach. Your code is tested and code coverage is high (100% to be exact). That said, how well is your code tested based on the range of possible inputs? There are quite a lot of integers that could be tested, but only four of them are being used in the test. In some situations this is enough. In other situations four cases isn't enough -- i.e., non-deterministic machine learning code. What about really small or large numbers? Or say your function takes a list of integers rather than a single integer -- What if the list was empty or it contained one element, hundreds of elements, or thousands of elements? In some situations we simply cannot provide (let alone even think of) all the possible cases. That's where property-base testing comes into play.

Machine learning algorithms are a great use case for property-based testing since it's difficult to produce (and maintain) test examples for complex sets of data.

Frameworks like Hypothesis provide recipes (Hypthesis calls them Strategies) for generating random test data. Hypothesis also stores the results of previous test runs and uses them to create new cases.

Strategies are algorithms that generate pseudo-random data based on the shape of the input data. It's pseudo-random because the generated data is based on data from previous tests.

The same test using property-based testing via Hypothesis looks like this:

from hypothesis import given
import hypothesis.strategies as st


@given(st.integers())
def test_add_one(num):
    assert increment(num) == num - 1

st.integers() is a Hypothesis Strategy that generates random integers for testing while the @given decorator is used to parameterize the test function. So when the test function is called, the generated integers, from the Strategy, will be passed into the test.

$ python -m pytest test_hypothesis.py --hypothesis-show-statistics

================================== test session starts ===================================
platform darwin -- Python 3.8.5, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/johndoe/sample-project
plugins: hypothesis-5.37.3
collected 1 item

test_hypothesis.py .                                                               [100%]
================================= Hypothesis Statistics ==================================

test_hypothesis.py::test_add_one:

  - during generate phase (0.06 seconds):
    - Typical runtimes: < 1ms, ~ 50% in data generation
    - 100 passing examples, 0 failing examples, 0 invalid examples

  - Stopped because settings.max_examples=100


=================================== 1 passed in 0.08s ====================================

Type Checking

Tests are code, and they should be treated as such. Like your business code, you need to maintain and refactor them. You may even have to deal with bugs from time to time. Because of this, it's a good practice to keep your tests short, simple, and straight to the point. You should also take care not to over test your code.

Runtime (or dynamic) type checkers, like Typeguard and pydantic, can help to minimize the number of tests. Let's take a look at an example of this with pydantic.

For example, let's say we have a User that has a single attribute, an email address:

class User:

    def __init__(self, email: str):
        self.email = email


user = User(email='[email protected]')

We want to be sure that the provided email is really a valid email address. So, to validate it, we'll have to add some helper code somewhere. Along with writing a test, we'll also have to spend time writing the regex for this. pydantic can help with this. We can use it to define our User model:

from pydantic import BaseModel, EmailStr


class User(BaseModel):
    email: EmailStr


user = User(email='[email protected]')

Now, the email argument will be validated by pydantic before every new User instance is created. When it's not a valid email -- i.e., User(email='something') -- a ValidationError will be raised. This eliminates the need to write our own validator. We also don't need to test it since the maintainers of pydantic handle that for us.

We can reduce the number of tests for any user provided data. And, instead, we just need to test that a ValidationError is handled correctly.

Let's look at a quick example in a Flask app:

import uuid

from flask import Flask, jsonify
from pydantic import ValidationError, BaseModel, EmailStr, Field


app = Flask(__name__)


@app.errorhandler(ValidationError)
def handle_validation_exception(error):
    response = jsonify(error.errors())
    response.status_code = 400
    return response


class Blog(BaseModel):
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    author: EmailStr
    title: str
    content: str

Test:

import json


def test_create_blog_bad_request(client):
    """
    GIVEN request data with invalid values or missing attributes
    WHEN endpoint /create-blog/ is called
    THEN it should return status 400 and JSON body
    """
    response = client.post(
        '/create-blog/',
        data=json.dumps(
            {
            'author': 'John Doe',
            'title': None,
            'content': 'Some extra awesome content'
        }
        ),
        content_type='application/json',
    )

    assert response.status_code == 400
    assert response.json is not None

Conclusion

Testing can often feel like a daunting task. There are always times that it can be, but hopefully this article provided some tools that you can use to make testing easier. Focus your testing efforts on decreasing flakey tests. Your tests should also be fast, isolated/independent, and deterministic/repeatable. In the end, having confidence in your test suite will help you deploy to production more often and, more importantly, help you sleep at night.

Happy testing!

The Complete Python Guide:

Modern Python Environments - dependency and workspace management

Testing in Python (this article!)

Modern Test-Driven Development in Python

Python Code Quality

Python Type Checking

Documenting Python Code and Projects

Python Project Workflow

Featured Course

Test-Driven Development with Python, Flask, and Docker

In this course, you'll learn how to set up a development environment with Docker in order to build and deploy a microservice powered by Python and Flask. You'll also apply the practices of Test-Driven Development with pytest as you develop a RESTful API.

Buy Now $30 View Course

Featured Course

Test-Driven Development with Python, Flask, and Docker

Buy Now $30 View Course

Share this tutorial

Contents

pytest

Mocking

Code Coverage

Mutation Testing

Hypothesis

Type Checking

Conclusion

Test-Driven Development with Python, Flask, and Docker

Test-Driven Development with Python, Flask, and Docker

Recommended Tutorials