Modern Test-Driven Development in Python

Testing production grade code is hard. Sometimes it can take nearly all of your time during feature development. What's more, even when you have 100% coverage and tests are green, you still may not feel confident that the new feature will work properly in production.

This guide will take you through the development of an application using Test-Driven Development (TDD). We'll look at how and what you should test. We'll use pytest for testing, pydantic to validate data and reduce the number of tests required, and Flask to provide an interface for our clients via a RESTful API. By the end, you'll have a solid pattern that you can use for any Python project so that you can have confidence that passing tests actually mean working software.

The Complete Python Guide:

Modern Python Environments - dependency and workspace management

Testing in Python

Modern Test-Driven Development in Python (this article!)

Python Code Quality

Python Type Checking

Documenting Python Code and Projects

Python Project Workflow

Objectives

By the end of this article, you will be able to:

Explain how you should test your software
Configure pytest and set up a project structure for testing
Define database models with pydantic
Use pytest fixtures for managing test state and performing side effects
Verify JSON responses against JSON Schema definitions
Organize database operations with commands (modify state, has side effects) and queries (read-only, no side effects)
Write unit, integration, and end-to-end tests with pytest
Explain why it's important to focus your testing efforts on testing behavior rather than implementation details

How Should I Test My Software?

Software developers tend to be very opinionated about testing. Because of this, they have differing opinions about how important testing is and ideas on how to go about doing it. That said, let's look at three guidelines that (hopefully) most developers will agree with that will help you write valuable tests:

Tests should tell you the expected behavior of the unit under test. Therefore, it's advisable to keep them short and to the point. The GIVEN, WHEN, THEN structure can help with this:
- GIVEN - what are the initial conditions for the test?
- WHEN - what is occurring that needs to be tested?
- THEN - what is the expected response?
So you should prepare your environment for testing, execute the behavior, and, at the end, check that output meets expectations.
Each piece of behavior should be tested once -- and only once. Testing the same behavior more than once does not mean that your software is more likely to work. Tests need to be maintained too. If you make a small change to your code base and then twenty tests break, how do you know which functionality is broken? When only a single test fails, it's much easier to find the bug.
Each test must be independent from other tests. Otherwise, you'll have hard time maintaining and running the test suite.

This guide is opinionated too. Don't take anything as a holy grail or silver bullet. Feel free to get in touch on Twitter (@jangiacomelli) to discuss anything related to this guide.

Basic Setup

With that, let's get our hands dirty. You're ready to see what all of this means in the real world. The most simple test with pytest looks like this:

def another_sum(a, b):
    return a + b


def test_another_sum():
    assert another_sum(3, 2) == 5

That's the example that you've probably already seen at least once. First of all, you'll never write tests inside your code base so let's split this into two files and packages.

Create a new directory for this project and move into it:

$ mkdir testing_project
$ cd testing_project

Next, create (and activate) a virtual environment.

For more on managing dependencies and virtual environments, check out Modern Python Environments.

Third, install pytest:

(venv)$ pip install pytest

After that, create a new folder called "sum". Add an __init__.py to the new folder, to turn it into a package, along with a another_sum.py file:

def another_sum(a, b):
    return a + b

Add another folder named "tests" and add the following files and folders:

└── tests
    ├── __init__.py
    └── test_sum
        ├── __init__.py
        └── test_another_sum.py

You should now have:

├── sum
│   ├── __init__.py
│   └── another_sum.py
└── tests
    ├── __init__.py
    └── test_sum
        ├── __init__.py
        └── test_another_sum.py

In test_another_sum.py add:

from sum.another_sum import another_sum


def test_another_sum():
    assert another_sum(3, 2) == 5

Next, add an empty conftest.py file, which is used for storing pytest fixtures, inside the "tests" folder.

Finally, add a pytest.ini -- a pytest configuration file -- to the "tests" folder, which can also be empty as this point.

The full project structure should now look like:

├── sum
│   ├── __init__.py
│   └── another_sum.py
└── tests
    ├── __init__.py
    ├── conftest.py
    ├── pytest.ini
    └── test_sum
        ├── __init__.py
        └── test_another_sum.py

Keeping your tests together in single package allows you to:

Reuse pytest configuration across all tests
Reuse fixtures across all tests
Simplify the running of tests

You can run all the tests with this command:

(venv)$ python -m pytest tests

You should see results of the tests, which in this case is for test_another_sum:

============================== test session starts ==============================
platform darwin -- Python 3.10.1, pytest-7.0.1, pluggy-1.0.0
rootdir: /testing_project/tests, configfile: pytest.ini
collected 1 item

tests/test_sum.py/test_another_sum.py .                                 [100%]

=============================== 1 passed in 0.01s ===============================

Real Application

Now that you have the basic idea behind how to set up and structure tests, let's build a simple blog application. We'll build it using TDD to see testing in action. We'll use Flask for our web framework and, to focus on testing, SQLite for our database.

Our app will have the following requirements:

articles can be created
articles can be fetched
articles can be listed

First, let's create a new project:

$ mkdir blog_app
$ cd blog_app

Second, create (and activate) a virtual environment.

Third, install pytest and pydantic, a data parsing and validation library:

(venv)$ pip install pytest && pip install "pydantic[email]"

pip install "pydantic[email]" installs pydantic along with email-validator, which will be used for validating email addresses.

Next, create the following files and folders:

blog_app
    ├── blog
    │   ├── __init__.py
    │   ├── app.py
    │   └── models.py
    └── tests
        ├── __init__.py
        ├── conftest.py
        └── pytest.ini

Add the following code to models.py to define a new Article model with pydantic:

import os
import sqlite3
import uuid
from typing import List

from pydantic import BaseModel, EmailStr, Field


class NotFound(Exception):
    pass


class Article(BaseModel):
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    author: EmailStr
    title: str
    content: str

    @classmethod
    def get_by_id(cls, article_id: str):
        con = sqlite3.connect(os.getenv("DATABASE_NAME", "database.db"))
        con.row_factory = sqlite3.Row

        cur = con.cursor()
        cur.execute("SELECT * FROM articles WHERE id=?", (article_id,))

        record = cur.fetchone()

        if record is None:
            raise NotFound

        article = cls(**record)  # Row can be unpacked as dict
        con.close()

        return article

    @classmethod
    def get_by_title(cls, title: str):
        con = sqlite3.connect(os.getenv("DATABASE_NAME", "database.db"))
        con.row_factory = sqlite3.Row

        cur = con.cursor()
        cur.execute("SELECT * FROM articles WHERE title = ?", (title,))

        record = cur.fetchone()

        if record is None:
            raise NotFound

        article = cls(**record)  # Row can be unpacked as dict
        con.close()

        return article

    @classmethod
    def list(cls) -> List["Article"]:
        con = sqlite3.connect(os.getenv("DATABASE_NAME", "database.db"))
        con.row_factory = sqlite3.Row

        cur = con.cursor()
        cur.execute("SELECT * FROM articles")

        records = cur.fetchall()
        articles = [cls(**record) for record in records]
        con.close()

        return articles

    def save(self) -> "Article":
        with sqlite3.connect(os.getenv("DATABASE_NAME", "database.db")) as con:
            cur = con.cursor()
            cur.execute(
                "INSERT INTO articles (id,author,title,content) VALUES(?, ?, ?, ?)",
                (self.id, self.author, self.title, self.content)
            )
            con.commit()

        return self

    @classmethod
    def create_table(cls, database_name="database.db"):
        conn = sqlite3.connect(database_name)

        conn.execute(
            "CREATE TABLE IF NOT EXISTS articles (id TEXT, author TEXT, title TEXT, content TEXT)"
        )
        conn.close()

This is an Active Record-style model, which provides methods for storing, fetching a single article, and listing all articles.

You may be wondering why we didn't write tests to cover the model. We'll get to the why shortly.

Create a New Article

Next, let's cover our business logic. We'll write some helper commands and queries to separate our logic from the model and API. Since we're using pydantic, we can easily validate data based on our model.

Create a "test_article" package in the "tests" folder. Then, add a file called test_commands.py to it.

blog_app
    ├── blog
    │   ├── __init__.py
    │   ├── app.py
    │   └── models.py
    └── tests
        ├── __init__.py
        ├── conftest.py
        ├── pytest.ini
        └── test_article
            ├── __init__.py
            └── test_commands.py

Add the following tests to test_commands.py:

import pytest

from blog.models import Article
from blog.commands import CreateArticleCommand, AlreadyExists


def test_create_article():
    """
    GIVEN CreateArticleCommand with valid author, title, and content properties
    WHEN the execute method is called
    THEN a new Article must exist in the database with the same attributes
    """
    cmd = CreateArticleCommand(
        author="[email protected]",
        title="New Article",
        content="Super awesome article"
    )

    article = cmd.execute()

    db_article = Article.get_by_id(article.id)

    assert db_article.id == article.id
    assert db_article.author == article.author
    assert db_article.title == article.title
    assert db_article.content == article.content


def test_create_article_already_exists():
    """
    GIVEN CreateArticleCommand with a title of some article in database
    WHEN the execute method is called
    THEN the AlreadyExists exception must be raised
    """

    Article(
        author="[email protected]",
        title="New Article",
        content="Super extra awesome article"
    ).save()

    cmd = CreateArticleCommand(
        author="[email protected]",
        title="New Article",
        content="Super awesome article"
    )

    with pytest.raises(AlreadyExists):
        cmd.execute()

These tests cover the following business use cases:

articles should be created for valid data
article title must be unique

Run the tests from your project directory to see that they fail:

(venv)$ python -m pytest tests

Now we can implement our command.

Add a commands.py file to the "blog" folder:

from pydantic import BaseModel, EmailStr

from blog.models import Article, NotFound


class AlreadyExists(Exception):
    pass


class CreateArticleCommand(BaseModel):
    author: EmailStr
    title: str
    content: str

    def execute(self) -> Article:
        try:
            Article.get_by_title(self.title)
            raise AlreadyExists
        except NotFound:
            pass

        article = Article(
            author=self.author,
            title=self.title,
            content=self.content
        ).save()

        return article

Test Fixtures

We can use pytest fixtures to clear the database after each test and create a new one before each test. Fixtures are functions decorated with a @pytest.fixture decorator. They are usually located inside conftest.py but they can be added to the actual test files as well. These functions are executed by default before each test.

One option is to use their returned values inside your tests. For example:

import random
import pytest


@pytest.fixture
def random_name():
    names = ["John", "Jane", "Marry"]
    return random.choice(names)


def test_fixture_usage(random_name):
    assert random_name

So, to use the value returned from the fixture inside the test you just need to add the name of the fixture function as a parameter to the test function.

Another option is to perform a side effect, like creating a database or mocking a module.

You can also run part of a fixture before and part after a test using yield instead of return. For example:

@pytest.fixture
def some_fixture():
    # do something before your test
    yield # test runs here
    # do something after your test

Now, add the following fixture to conftest.py, which creates a new database before each test and removes it after:

import os
import tempfile

import pytest

from blog.models import Article


@pytest.fixture(autouse=True)
def database():
    _, file_name = tempfile.mkstemp()
    os.environ["DATABASE_NAME"] = file_name
    Article.create_table(database_name=file_name)
    yield
    os.unlink(file_name)

The autouse flag is set to True so that it's automatically used by default before (and after) each test in the test suite. Since we're using a database for all tests it makes sense to use this flag. That way you don't have to explicitly add the fixture name to every test as a parameter.

If you do happen to not need access to the database for a test here and there you can disable autouse with a test marker. You can see an example of this here.

Run the tests again:

(venv)$ python -m pytest tests

They should pass.

As you can see, our test only tests the CreateArticleCommand command. We don't test the actual Article model since it's not responsible for business logic. We know that the command works as expected. Therefore, there's no need to write any additional tests.

List All Articles

The next requirement is to list all articles. We'll use a query instead of command here, so add a new file called test_queries.py to the "test_article" folder:

from blog.models import Article
from blog.queries import ListArticlesQuery


def test_list_articles():
    """
    GIVEN 2 articles stored in the database
    WHEN the execute method is called
    THEN it should return 2 articles
    """
    Article(
        author="[email protected]",
        title="New Article",
        content="Super extra awesome article"
    ).save()
    Article(
        author="[email protected]",
        title="Another Article",
        content="Super awesome article"
    ).save()

    query = ListArticlesQuery()

    assert len(query.execute()) == 2

Run the tests:

(venv)$ python -m pytest tests

They should fail.

Add a queries.py file to the "blog" folder:

blog_app
    ├── blog
    │   ├── __init__.py
    │   ├── app.py
    │   ├── commands.py
    │   ├── models.py
    │   └── queries.py
    └── tests
        ├── __init__.py
        ├── conftest.py
        ├── pytest.ini
        └── test_article
            ├── __init__.py
            ├── test_commands.py
            └── test_queries.py

Now we can implement our query:

from typing import List

from pydantic import BaseModel

from blog.models import Article


class ListArticlesQuery(BaseModel):

    def execute(self) -> List[Article]:
        articles = Article.list()

        return articles

Despite having no parameters here, for consistency we inherited from BaseModel.

Run the tests again:

(venv)$ python -m pytest tests

They should now pass.

Get Article by ID

Getting a single article by its ID can be done in similar way as listing all articles. Add a new test for GetArticleByIDQuery to test_queries.py.:

from blog.models import Article
from blog.queries import ListArticlesQuery, GetArticleByIDQuery


def test_list_articles():
    """
    GIVEN 2 articles stored in the database
    WHEN the execute method is called
    THEN it should return 2 articles
    """
    Article(
        author="[email protected]",
        title="New Article",
        content="Super extra awesome article"
    ).save()
    Article(
        author="[email protected]",
        title="Another Article",
        content="Super awesome article"
    ).save()

    query = ListArticlesQuery()

    assert len(query.execute()) == 2


def test_get_article_by_id():
    """
    GIVEN ID of article stored in the database
    WHEN the execute method is called on GetArticleByIDQuery with an ID
    THEN it should return the article with the same ID
    """
    article = Article(
        author="[email protected]",
        title="New Article",
        content="Super extra awesome article"
    ).save()

    query = GetArticleByIDQuery(
        id=article.id
    )

    assert query.execute().id == article.id

Run the tests to ensure they fail:

(venv)$ python -m pytest tests

Next, add GetArticleByIDQuery to queries.py:

from typing import List

from pydantic import BaseModel

from blog.models import Article


class ListArticlesQuery(BaseModel):

    def execute(self) -> List[Article]:
        articles = Article.list()

        return articles


class GetArticleByIDQuery(BaseModel):
    id: str

    def execute(self) -> Article:
        article = Article.get_by_id(self.id)

        return article

The tests should now pass:

(venv)$ python -m pytest tests

Nice. We've meet all of the above mentioned requirements:

articles can be created
articles can be fetched
articles can be listed

And they're all covered with tests. Since we're using pydantic for data validation at runtime, we don't need a lot of tests to cover the business logic as we don't need to write tests for validating data. If author is not a valid email, pydantic will raise an error. All that was needed was to set the author attribute to the EmailStr type. We don't need to test it either because it's already being tested by the pydantic maintainers.

With that, we're ready to expose this functionality to the world via a Flask RESTful API.

Expose the API with Flask

We'll introduce three endpoints that cover this requirement:

/create-article/ - create a new article
/article-list/ - retrieve all articles
/article/<article_id>/ - fetch a single article

First, create a folder called "schemas" inside "test_article", and add two JSON schemas to it, Article.json and ArticleList.json.

Article.json:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Article",
  "type": "object",
  "properties": {
    "id": {
      "type": "string"
    },
    "author": {
      "type": "string"
    },
    "title": {
      "type": "string"
    },
    "content": {
      "type": "string"
    }
  },
  "required": ["id", "author", "title", "content"]
}

ArticleList.json:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "ArticleList",
  "type": "array",
  "items": {"$ref":  "file:Article.json"}
}

JSON Schemas are used to define the responses from API endpoints. Before continuing, install the jsonschema Python library, which will be used to validate JSON payloads against the defined schemas, and Flask:

(venv)$ pip install jsonschema Flask

Next, let's write integration tests for our API.

Add a new file called test_app.py to "test_article":

import json
import pathlib

import pytest
from jsonschema import validate, RefResolver

from blog.app import app
from blog.models import Article


@pytest.fixture
def client():
    app.config["TESTING"] = True

    with app.test_client() as client:
        yield client


def validate_payload(payload, schema_name):
    """
    Validate payload with selected schema
    """
    schemas_dir = str(
        f"{pathlib.Path(__file__).parent.absolute()}/schemas"
    )
    schema = json.loads(pathlib.Path(f"{schemas_dir}/{schema_name}").read_text())
    validate(
        payload,
        schema,
        resolver=RefResolver(
            "file://" + str(pathlib.Path(f"{schemas_dir}/{schema_name}").absolute()),
            schema  # it's used to resolve the file inside schemas correctly
        )
    )


def test_create_article(client):
    """
    GIVEN request data for new article
    WHEN endpoint /create-article/ is called
    THEN it should return Article in json format that matches the schema
    """
    data = {
        'author': "[email protected]",
        "title": "New Article",
        "content": "Some extra awesome content"
    }
    response = client.post(
        "/create-article/",
        data=json.dumps(
            data
        ),
        content_type="application/json",
    )

    validate_payload(response.json, "Article.json")


def test_get_article(client):
    """
    GIVEN ID of article stored in the database
    WHEN endpoint /article/<id-of-article>/ is called
    THEN it should return Article in json format that matches the schema
    """
    article = Article(
        author="[email protected]",
        title="New Article",
        content="Super extra awesome article"
    ).save()
    response = client.get(
        f"/article/{article.id}/",
        content_type="application/json",
    )

    validate_payload(response.json, "Article.json")


def test_list_articles(client):
    """
    GIVEN articles stored in the database
    WHEN endpoint /article-list/ is called
    THEN it should return list of Article in json format that matches the schema
    """
    Article(
        author="[email protected]",
        title="New Article",
        content="Super extra awesome article"
    ).save()
    response = client.get(
        "/article-list/",
        content_type="application/json",
    )

    validate_payload(response.json, "ArticleList.json")

So, what's happening here?

First, we defined the Flask test client as a fixture so that it can be used in the tests.
Next, we added a function for validating payloads. It takes two parameters:
1. payload - JSON response from the API
2. schema_name - name of the schema file inside the "schemas" directory
Finally, there are three tests, one for each endpoint. Inside each test there's a call to the API and validation of the returned payload

Run the tests to ensure they fail at this point:

(venv)$ python -m pytest tests

Now we can write the API.

Update app.py like so:

from flask import Flask, jsonify, request

from blog.commands import CreateArticleCommand
from blog.queries import GetArticleByIDQuery, ListArticlesQuery

app = Flask(__name__)


@app.route("/create-article/", methods=["POST"])
def create_article():
    cmd = CreateArticleCommand(
        **request.json
    )
    return jsonify(cmd.execute().dict())


@app.route("/article/<article_id>/", methods=["GET"])
def get_article(article_id):
    query = GetArticleByIDQuery(
        id=article_id
    )
    return jsonify(query.execute().dict())


@app.route("/article-list/", methods=["GET"])
def list_articles():
    query = ListArticlesQuery()
    records = [record.dict() for record in query.execute()]
    return jsonify(records)


if __name__ == "__main__":
    app.run()

Our route handlers are pretty simple since all of our logic is covered by the commands and queries. Available actions with side effects (like mutations) are represented by commands -- e.g., creating a new article. On the other hand, actions that don't have side effects, the ones that are just reading current state, are covered by queries.

The command and query pattern used in this post is a simplified version of the CQRS pattern. We're combining CQRS and CRUD.

The .dict() method above is provided by the BaseModel from pydantic, which all of our models inherit from.

The tests should pass:

(venv)$ python -m pytest tests

We've covered the happy path scenarios. In the real world we must expect that clients won't always use the API as it was intended. For example, when a request to create an article is made without a title, a ValidationError will be raised by the CreateArticleCommand command, which will result in an internal server error and an HTTP status 500. That's something that we want to avoid. Therefore, we need to handle such errors to notify the user about the bad request gracefully.

Let's write tests to cover such cases. Add the following to test_app.py:

@pytest.mark.parametrize(
    "data",
    [
        {
            "author": "John Doe",
            "title": "New Article",
            "content": "Some extra awesome content"
        },
        {
            "author": "John Doe",
            "title": "New Article",
        },
        {
            "author": "John Doe",
            "title": None,
            "content": "Some extra awesome content"
        }
    ]
)
def test_create_article_bad_request(client, data):
    """
    GIVEN request data with invalid values or missing attributes
    WHEN endpoint /create-article/ is called
    THEN it should return status 400
    """
    response = client.post(
        "/create-article/",
        data=json.dumps(
            data
        ),
        content_type="application/json",
    )

    assert response.status_code == 400
    assert response.json is not None

We used pytest's parametrize option, which simplifies passing in multiple inputs to a single test.

Test should fail at this point because we haven't handled the ValidationError yet:

(venv)$ python -m pytest tests

So let's add an error handler to the Flask app inside app.py:

from pydantic import ValidationError

# Other code ...

app = Flask(__name__)


@app.errorhandler(ValidationError)
def handle_validation_exception(error):
    response = jsonify(error.errors())
    response.status_code = 400
    return response

# Other code ...

ValidationError has an errors method that returns a list of all errors for each field that was either missing or passed a value that didn't pass validation. We can simply return this in the body and set the response's status to 400.

Now that the error is handled appropriately all tests should pass:

(venv)$ python -m pytest tests

Code Coverage

Now, with our application tested, it's the time to check code coverage. So let's install a pytest plugin for coverage called pytest-cov:

(venv)$ pip install pytest-cov

After the plugin is installed, we can check code coverage of our blog application like this:

(venv)$ python -m pytest tests --cov=blog

You should see something similar to:

---------- coverage: platform darwin, python 3.10.1-final-0 ----------
Name               Stmts   Miss  Cover
--------------------------------------
blog/__init__.py       0      0   100%
blog/app.py           25      1    96%
blog/commands.py      16      0   100%
blog/models.py        57      1    98%
blog/queries.py       12      0   100%
--------------------------------------
TOTAL                110      2    98%

Is 98% coverage good enough? It probably is. Nonetheless, remember one thing: High coverage percentage is great but the quality of your tests is much more important. If only 70% or less of code is covered you should think about increasing coverage percentage. But it generally doesn't make sense to write tests to go from 98% to 100%. (Again, tests need to be maintained just like your business logic!)

End-to-end Tests

We have a working API at this point that's fully tested. We can now look at how to write some end-to-end (e2e) tests. Since we have a simple API we can write a single e2e test to cover the following scenario:

create a new article
list articles
get the first article from the list

First, install the requests library:

(venv)$ pip install requests

Second, add a new test to test_app.py:

import requests
# other code ...

@pytest.mark.e2e
def test_create_list_get(client):
    requests.post(
        "http://localhost:5000/create-article/",
        json={
            "author": "[email protected]",
            "title": "New Article",
            "content": "Some extra awesome content"
        }
    )
    response = requests.get(
        "http://localhost:5000/article-list/",
    )

    articles = response.json()

    response = requests.get(
        f"http://localhost:5000/article/{articles[0]['id']}/",
    )

    assert response.status_code == 200

There are two things that we need to do before running this test...

First, register a marker called e2e with pytest by adding the following code to pytest.ini:

[pytest]
markers =
    e2e: marks tests as e2e (deselect with '-m "not e2e"')

pytest markers are used to exclude some tests from running or to include selected tests independent of their location.

To run only the e2e tests, run:

(venv)$ python -m pytest tests -m 'e2e'

To run all tests except e2e:

(venv)$ python -m pytest tests -m 'not e2e'

e2e tests are more expensive to run and require the app to be up and running, so you probably don't want to run them at all times.

Since our e2e test hits a live server, we'll need to spin up the app. Navigate to the project in a new terminal window, activate the virtual environment, and run the app:

(venv)$ FLASK_APP=blog/app.py python -m flask run

Now we can run our e2e test:

(venv)$ python -m pytest tests -m 'e2e'

You should see a 500 error. Why? Don't the unit tests pass? Yes. The problem is that we didn't create the database table. We used fixtures for this in our tests which do this for us. So let's create a table and a database.

Add an init_db.py file to the "blog" folder:

if __name__ == "__main__":
    from blog.models import Article
    Article.create_table()

Run the new script and start the server again:

(venv)$ python blog/init_db.py
(venv)$ FLASK_APP=blog/app.py python -m flask run

If you run into any problems running init_db.py, you may need to set the Python path: export PYTHONPATH=$PYTHONPATH:$PWD.

The test should now pass:

(venv)$ python -m pytest tests -m 'e2e'

Testing Pyramid

We started with unit tests (to test the commands and queries) followed by integration tests (to test the API endpoints), and finished with e2e tests. In simple applications, as in this example, you may end up with a similar number of unit and integration tests. In general, the greater the complexity, the more you should see a pyramid-like shape in terms of the relationship between unit, integration, and e2e tests. That's where the "test pyramid" term comes from.

The Test Pyramid is a framework that can help developers create high-quality software.

Using the Test Pyramid as a guide, you typically want 50% of your tests in your test suite to be unit tests, 30% to be integration tests, and 20% to be e2e tests.

Definitions:

Unit test - tests a single unit of code
Integration tests - tests that multiple units work together
e2e - tests the whole application against a live production-like server

The higher up you go in the pyramid, the more brittle and less predictable your tests are. What's more, e2e tests are by far the slowest to run so even though they can bring confidence that your application is doing what's expected of it, you shouldn't have nearly as many of them as unit or integration tests.

What is a Unit?

It's pretty straightforward what integration and e2e tests look like. There's much more discussion about unit tests since you first have to define what a "unit" actually is. Most testing tutorials show a unit test example that tests a single function or method. Production code is never that simple.

First things first, before defining what a unit is, let's look at what the point of testing is in general and what should be tested.

Why Test?

We write tests to:

Ensure our code works as expected
Protect our software against regressions

Nonetheless, when feedback cycles are too long, developers tend to start to think more about the types of tests to write since time is a major constraint in software development. That's why we want to have more unit tests than other types of tests. We want to find and fix the defect as fast as possible.

What to Test?

Now that you know why we should test, we now must look at what we should test.

We should test the behavior of our software. (And, yes: This still applies to TDD, not just BDD.) This is because you shouldn't have to change your tests every time there's a change to the code base.

Think back to the example of the real world application. From a testing perspective, we don't care where the articles are stored. It could be a text file, some other relational database, or a key/value store -- it doesn't matter. Again, our app had the following requirements:

articles can be created
articles can be fetched
articles can be listed

As long as those requirements don't change, a change to the storage medium shouldn't break our tests. Similarly, we know that as long as those tests pass, we know our software meets those requirements -- so it's working.

So What is a Unit Then?

Each function/method is technically a unit, but we still shouldn't test every single one of them. Instead, focus your energy on testing the functions and methods that are publicly exposed from a module/package.

In our case, these were the execute methods. We don't expect to call the Article model directly from the Flask API, so don't focus much (if any) energy on testing it. To be more precise, in our case, the "units", that should be tested, are the execute methods from the commands and queries. If some method is not intended to be directly called from other parts of our software or an end user, it's probably implementation detail. Consequently, our tests are resistant to refactoring to the implementation details, which is one of the qualities of great tests.

For example, our tests should still pass if we wrapped the logic for get_by_id and get_by_title in a "protected" method called _get_by_attribute:

# other code ...

class Article(BaseModel):
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    author: EmailStr
    title: str
    content: str

    @classmethod
    def get_by_id(cls, article_id: str):
        return cls._get_by_attribute("SELECT * FROM articles WHERE id=?", (article_id,))

    @classmethod
    def get_by_title(cls, title: str):
        return cls._get_by_attribute("SELECT * FROM articles WHERE title = ?", (title,))

    @classmethod
    def _get_by_attribute(cls, sql_query: str, sql_query_values: tuple):
        con = sqlite3.connect(os.getenv("DATABASE_NAME", "database.db"))
        con.row_factory = sqlite3.Row

        cur = con.cursor()
        cur.execute(sql_query, sql_query_values)

        record = cur.fetchone()

        if record is None:
            raise NotFound

        article = cls(**record)  # Row can be unpacked as dict
        con.close()

        return article

# other code ..

On the other hand, if you make a breaking change inside Article the tests will fail. And that's exactly what we want. In that situation, we can either revert the breaking change or adapt to it inside our command or query.

Because there's one thing that we're striving for: Passing tests means working software.

When Should You Use Mocks?

We didn't use any mocks in our tests, because we didn't need them. Mocking methods or classes inside your modules or packages produces tests that are not resistant to refactoring because they are coupled to the implementation details. Such tests break often and are costly to maintain. On the other hand, it makes sense to mock external resources when speed is an issue (calls to external APIs, sending emails, long-running async processes, etc.).

For example, we could test the Article model separately and mock it inside our tests for CreateArticleCommand like so:

def test_create_article(monkeypatch):
    """
    GIVEN CreateArticleCommand with valid properties author, title and content
    WHEN the execute method is called
    THEN a new Article must exist in the database with same attributes
    """
    article = Article(
        author="[email protected]",
        title="New Article",
        content="Super awesome article"
    )
    monkeypatch.setattr(
        Article,
        "save",
        lambda self: article
    )
    cmd = CreateArticleCommand(
        author="[email protected]",
        title="New Article",
        content="Super awesome article"
    )

    db_article = cmd.execute()

    assert db_article.id == article.id
    assert db_article.author == article.author
    assert db_article.title == article.title
    assert db_article.content == article.content

Yes, that's perfectly fine to do, but we now have more tests to maintain -- i.e., all the tests from before plus all the new tests for the methods in Article. Besides that, the only thing that's now tested by test_create_article is that an article returned from save is the same as the one returned by execute. When we break something inside Article this test will still pass because we mocked it. And that's something we want to avoid: We want to test software behavior to ensure that it works as expected. In this case, behavior is broken but our test won't show that.

Takeaways

There's no single right way to test your software. Nonetheless, it's easier to test logic when it's not coupled with your database. You can use the Active Record pattern with commands and queries (CQRS) to help with this.
Focus on the business value of your code.
Don't test methods just to say they're tested. You need working software not tested methods. TDD is just a tool to deliver better software faster and more reliable. Similar can be said for code coverage: Try to keep it high but don't add tests just to have 100% coverage.
A test is valuable only when it protects you against regressions, allows you to refactor, and provides you fast feedback. Therefore, you should strive for your tests to resemble a pyramid shape (50% unit, 30% integration, 20% e2e). Although, in simple applications, it may look more like a house (40% unit, 40% integration, 20% e2e), which is fine.
The faster you notice regressions, the faster you can intercept and correct them. The faster you correct them, the shorter the development cycle. To speed up feedback, you can use pytest markers to exclude e2e and other slow tests during development. You can run them less frequently.
Use mocks only when necessary (like for third-party HTTP APIs). They make your test setup more complicated and your tests overall less resistant to refactoring. Plus, they can result in false positives.
Once again, your tests are a liability not an asset; they should cover your software's behavior but don't over test.

Conclusion

There's a lot to digest here. Keep in mind that these are just examples used to show the ideas. You can use the same ideas with Domain-driven design (DDD), Behavior-driven design (BDD), and many other approaches. Keep in mind that tests should be treated the same as any other code: They are a liability and not an asset. Write tests to protect your software against the bugs but don't let it burn your time.

Want to learn more?

The Complete Python Guide:

Modern Python Environments - dependency and workspace management

Testing in Python

Modern Test-Driven Development in Python (this article!)

Python Code Quality

Python Type Checking

Documenting Python Code and Projects

Python Project Workflow

Featured Course

Scalable FastAPI Applications on AWS

In this course, you'll learn how to go from idea to scalable FastAPI application running on AWS infrastructure managed by Terraform.

Buy Now $30 View Course

Share this tutorial

Contents

Objectives

How Should I Test My Software?

Basic Setup

Real Application

Create a New Article

Test Fixtures

List All Articles

Get Article by ID

Expose the API with Flask

Code Coverage

End-to-end Tests

Testing Pyramid

What is a Unit?

Why Test?

What to Test?

So What is a Unit Then?

When Should You Use Mocks?

Takeaways

Conclusion

Want to learn more?

Scalable FastAPI Applications on AWS

Scalable FastAPI Applications on AWS

Recommended Tutorials