One of the decisions you need to make when starting a new project is which database to use. If you're using a batteries-included framework like Django, there's no reason to think twice. Pick one of the supported database engines and there you go. On the other hand, if you're using a micro framework like FastAPI or Flask, you need to make this decision yourself. Your decision impacts which libraries you'll need and how you'll approach data modeling. Although you may just reach for Postgres, there are other options available. One of them is DynamoDB. Like Postgres, it can be a great choice for Python projects.
In this article, I'll compare Postgres and DynamoDB. I'll describe how to use them, explain what the differences are, and help you decide which one to choose for your Python applications.
Contents
What is Postgres? What is DynamoDB?
Postgres is a relational database management system (RDBMS). It's an open-source project that has been around for more than 35 years. It's a mature and stable database engine. It's a great choice for most projects.
DynamoDB is a NoSQL database service provided by AWS. It's a fully managed service, which means that you don't need to worry about scaling, backups, or availability. It's a great choice for projects that need to scale fast and when you don't want to spend time managing the database.
Although Postgres doesn't come as a fully managed service, it comes pretty close (in ease of management) to DynamoDB when used inside AWS RDS.
The first obvious difference is that Postgres is a SQL database while DynamoDB is NoSQL database. This doesn't really tell us much. Yes, DynamoDB isn't a SQL database, but what is it? DynamoDB is actually a key/value database. What does that mean for your application? How can we even compare these two databases? Isn't a key/value store really just Redis? Let's not jump to conclusions just yet. Let's first look at an example.
Say we need to manage TODO tasks. A classic SQL approach would be to create a table like so:
id | title | status | owner |
---|---|---|---|
1 | Clean your office | OPEN | [email protected] |
Then we can query it to get all tasks for a given owner:
SELECT * FROM tasks WHERE owner = '[email protected]';
Things are different with DynamoDB. DynamoDB is a key/value store. We can choose between two key schemas. We can use only a HASH key. In such case, we need to know the ID in order to query the record (think of that as Redis). Using this key schema, we would have a table like this:
ID (HASH) | Title | Status | Owner |
---|---|---|---|
some_id | Clean your office | OPEN | [email protected] |
We can query it to get a task for a given ID:
import boto3
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("tasks")
table.get_item(
Key={
"ID": "some_id",
},
)
So there's no way to efficiently list tasks for a given owner.
In reality, you can scan the whole table using scan, but that's very inefficient and should be avoided.
Fortunately, there's another option, HASH & RANGE key schema. In this case, we can use two keys to identify a record. We can use a HASH key to identify a partition and a RANGE key to identify a record inside the partition. HASH & RANGE key combination must be unique for every record. In our case, we can use owner as a HASH key and the ID as a RANGE key. In such case, we would have table like this:
ID (RANGE) | Title | Status | Owner (HASH) |
---|---|---|---|
some_id | Clean your office | OPEN | [email protected] |
With such structure, we can query all tasks for a given owner like so:
import boto3
from boto3.dynamodb.conditions import Key
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("tasks")
table.query(
KeyConditionExpression=Key("Owner").eq("[email protected]"),
)
Now you'll get all records where owner is [email protected]
-- and they'll be sorted by ID.
Although we've just scratched the surface, there's one point I want to make: You can use Postgres or DynamoDB as your application database. You just need to approach data modeling differently.
ORMs
ORMs (object-relational mappers) are libraries that allow you to easily map database records to Python objects. They mostly follow either the active record or data mapper pattern. With the active record pattern, a model object is responsible for business logic and data persistence. One such example is the Django ORM. With data mapper, these two responsibilities are delegated to two objects, model objects contain attributes and business logic while there's a mapper object that's responsible for data persistence. One example is SQLAlchemy. They are built on top of SQL databases. Therefore, it's no surprise that there are plenty of ORM options available for Postgres -- SQLAlchemy, Django ORM, Peewee, PonyORM, to name a few.
Which one to choose is mostly a matter of personal preference. While the syntax may differ, they all allow you to easily define your models and query the database. Let's take a look at example using SQLAlchemy:
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class Task(Base):
__tablename__ = "tasks"
id = Column(Integer, primary_key=True, autoincrement=True)
title = Column(String)
status = Column(String)
owner = Column(String)
engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
task = Task(title="Clean your office", status="OPEN", owner="[email protected]")
session.add(task)
session.commit()
tasks = session.query(Task).filter(Task.owner == "[email protected]").all()
print(tasks)
As you can see, it's pretty straightforward: You define your model, create a session, and then query the database. You can use the same/similar approach with other ORMs.
While things can be perfectly fine when using ORMs with Postgres, I can't say the same for DynamoDB. There are some ORMs for DynamoDB -- e.g., PynamoDB and DynamORM -- but, in my opinion, they destroy the whole purpose of DynamoDB. If you want to really benefit from DynamoDB, you should go with single-table design -- which is not supported by ORMs. By "really benefit" I mean to actually use DynamoDB instead of building another relational database with UUIDs on top of it. This means we're on our own when using DynamoDB. Let's take a look at how to use it.
The ORM equivalent for NoSQL databases are ODMs (object-document mappers). See MongoDB's ODMs, for example. Like ORMs, they simplify interaction with databases. You could find solutions like that for DynamoDB as well -- e.g., dynamodb-mapper. After a deeper look, you'll realize that they are not widely used, not maintained, not production-ready, or that they enforce unnecessary constraints.
When using a single table, one needs to generalize the keys, so they can be used for different types of records. Altering the example above, we end up with a table like this:
PK (HASH) | SK (RANGE) | Status | Owner | ID | Title |
---|---|---|---|---|---|
#TASK#[email protected] | #TASK#some_id | OPEN | [email protected] | some_id | Clean your office |
You can see the following:
- The HASH key has a generic name,
PK
(partition key). It can be used for different types of records. - The RANGE key has a generic name,
SK
(sort key). It can be used for different types of records.
We left the rest as is, so we can easily map data back to model object. Since DynamoDB is schema-less, we need to provide only PK and SK for every record. Everything else is optional. This allows us to use a single table for different types of records.
To support more complex examples (and queries), one needs to use composite keys for PK and SK. For example, the structure of keys would be different for tasks and boards:
- Tasks: PK=#TASK#{owner}, SK=#TASK#{task_id}.
- Boards: PK=#BOARD#{owner}, SK=#BOARD#{contact_id}.
This gives you the ability to easily query all tasks by owner or all boards by owner. ORMs discourage usage of composite keys which are necessary for single-table design. That's one of the reasons why ORMs don't play nicely with DynamoDB. You can read this interesting article to learn more.
DynamoDB is all about predictable performance. That's why we can query records only by HASH and RANGE keys (a HASH key must be always present in a query).
It's not possible to query by other columns; you can use indexes to support more queries, but more on that later.
Because of that, we must build our data model around queries and not around data. That goes also for modeling relationships. With an ORM, you'll just define a foreign key relationship between two models. Then you'll be able to access related objects via their parent(s) and vice versa. There's no need to do anything else. While this might be handy for simple apps, it can become a huge bottleneck in your application (N+1 queries). With DynamoDB single-table design, one wants to load everything that's needed with a single query.
For example, if you want to load a board with all your tasks, you'd have to do single query to DynamoDB. It will return different types of records: boards (name, last updates, ...), open tasks, closed tasks, and so on. Then, it's up to you to map these things to a board object that contains tasks, etc. Thus, it's nearly impossible to build a generic ORM for DynamoDB. Everything is very use-case/query specific. While this might sound like a lot of work, it forces you to design for efficiency. Therefore, with DynamoDB, you need to build your own abstraction for data persistence -- e.g., a repository. You can specify your plain-Python (or pydantic or whatever) models and use Boto3 to implement your repository.
You can learn more about data modeling for single-table design (including modeling of relationships) in The DynamoDB book.
Let's look at an example.
Models:
from dataclasses import dataclass
from enum import Enum
from uuid import UUID
class TaskStatus(str, Enum):
OPEN = "OPEN"
CLOSED = "CLOSED"
@dataclass
class Task:
id: UUID
title: str
status: TaskStatus
owner: str
@classmethod
def create(cls, id_, title, owner):
return cls(id_, title, TaskStatus.OPEN, owner)
Store:
from uuid import UUID
import boto3
from boto3.dynamodb.conditions import Key
from models import Task, TaskStatus
class TaskStore:
def __init__(self, table_name):
self.table_name = table_name
def add(self, task):
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(self.table_name)
table.put_item(
Item={
"PK": f"#TASK#{task.owner}",
"SK": f"#TASK#{task.id}",
"id": str(task.id),
"title": task.title,
"status": task.status.value,
"owner": task.owner,
}
)
def list_by_owner(self, owner):
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(self.table_name)
last_key = None
query_kwargs = {
"IndexName": "GS1",
"KeyConditionExpression": Key("PK").eq(f"#TASK#{owner}"),
}
tasks = []
while True:
if last_key is not None:
query_kwargs["ExclusiveStartKey"] = last_key
response = table.query(**query_kwargs)
tasks.extend(
[
Task(
id=UUID(record["id"]),
title=record["title"],
owner=record["owner"],
status=TaskStatus[record["status"]],
)
for record in response["Items"]
]
)
last_key = response.get("LastEvaluatedKey")
if last_key is None:
break
return tasks
DynamoDB uses LastEvaluatedKey for pagination.
In short, there are ORMs for both Postgres and DynamoDB. You probably want to use one with Postgres, but you probably don't want to use one with DynamoDB.
Migrations
Another interesting topic -- database migrations. Database migrations provide a way to update database schemas. For Postgres, there are tools that simplify migrations:
You can use them to easily add tables/indexes/columns/constraints to your database. They can even be autogenerated from model definitions with your ORM. Example migration with Alembic:
"""create tasks table
Revision ID: 1f8b5f5d1f5a
Revises:
Create Date: 2021-12-05 12:00:00.000000
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "1f8b5f5d1f5a"
down_revision = None
branch_labels = None
depends_on = None
def upgrade():
op.create_table(
"tasks",
sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
sa.Column("title", sa.String),
sa.Column("status", sa.String),
sa.Column("owner", sa.String),
)
def downgrade():
op.drop_table("tasks")
For most common operations, it's pretty straightforward. With autogenerated migrations, there's not much work to do. You can always write them by hand if needed.
With DynamoDB, there's no schema -- and we also said we want a single table. There's no way to change keys so there's nothing to migrate. Or is there? It turns out that you have two options when adding/removing attributes to/from records:
- Handle changes at runtime -- e.g., add columns with default values
- Follow the following three-step pattern:
- Start writing new records with a new "attribute"
- Update existing records with a new "attribute" to the desired value (you can scan all records in a table and update them)
- Start using the new "attribute" in your code
You can generalize the three-step pattern to other types of changes, like removing attributes.
With all the tooling available, migrations are easier with Postgres.
If you work on more complex project where poor performance or downtime are really not an option, you'll probably write most of your migration code by hand. For such cases, there's not a huge difference between Postgres and DynamoDB.
Queries
We actually already touched on this topic. We even looked at example queries. Nevertheless, let's dig even deeper.
Anyone not used to DynamoDB would say that it doesn't make sense to use it due to the very limited ability to query the data. It's indeed true that queries are much less flexible in DynamoDB. The common perception is that you can query any data using any set of columns you want when using SQL databases. Well, that's true in theory. In other words, it's true only if your database is small enough to fit into your server's memory.
Once you have a more serious volume of data, queries tend to slow, which, in turn, slows your application. Sooner or later, you'll need to start thinking about indices. To have fast queries, you'll need to add indices and query only by indexed columns. You'll also realize that you can't afford queries without limiting the number of returned records. Queries without limits can easily bring your system down. That's because you have zero idea of how much data they will return. Your database might need to read one billion rows from disk, and this takes time.
So where does that leaves us? DynamoDB is built in a way that forces you to consider these things upfront.
When designing your data model, you need to:
- Partition your data with a HASH key
- Sort your data with a RANGE key
You can then only query by HASH and RANGE keys or by pre-defined indices, which have their own HASH and RANGE keys. There's also a limit on how much data is returned by a single query. So you're forced to use pagination.
All these limitations might sound scary and overwhelming at the beginning. Once you get used to them, you realize that they're there for a reason. Let's take a look at the example queries...
With DynamoDB:
import boto3
from boto3.dynamodb.conditions import Key
owner = "[email protected]"
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("tasks-api")
last_key = None
query_kwargs = {
"IndexName": "GS1",
"KeyConditionExpression": Key("PK").eq(f"#TASK#{owner}"),
}
tasks = []
while True:
if last_key is not None:
query_kwargs["ExclusiveStartKey"] = last_key
response = table.query(**query_kwargs)
tasks.extend(response["Items"])
last_key = response.get("LastEvaluatedKey")
if last_key is None:
break
print(tasks)
With Postgres + SQLAlchemy (a paginated query):
from models import Session, Task
tasks = []
last_evaluated_id = -1
session = Session()
while True:
query = session.query(Task).filter(Task.id > last_evaluated_id).order_by(Task.id).limit(10)
tasks.extend(query.all())
last_evaluated_id = tasks[-1].id
if len(tasks) < 10:
break
print(tasks)
As you can see, there's no major difference in code complexity between DynamoDB and Postgres paginated query.
DynamoDB also has other query limitations:
- We must always provide at least HASH key
- We can query only by HASH and RANGE keys or by pre-defined indices
- Returned records are sorted by RANGE key. You can only choose between ascending and descending order
Knowing that, with our current table design, we can support only two queries:
- list by owner (you've seen it above)
- get by owner and task ID (you can see it below)
import boto3
from boto3.dynamodb.conditions import Key
owner = "[email protected]"
task_id = "7b0f4bc8-4751-41f1-b3b1-23a935d81cd6"
dynamodb = boto3.resource("dynamodb", endpoint_url="http://localhost:9999")
table = dynamodb.Table("tasks-api")
query_kwargs = {
"KeyConditionExpression": Key("PK").eq(f"#TASK#{owner}") & Key("SK").eq(f"#TASK#{task_id}"),
}
task = response = table.query(**query_kwargs)["Items"][0]
print(task)
We can't query by status, for example. But we can do that with Postgres:
from models import Session, Task
tasks = []
last_evaluated_id = -1
session = Session()
while True:
query = session.query(Task).filter(Task.status == "OPEN").filter(Task.id > last_evaluated_id).order_by(Task.id).limit(10)
tasks.extend(query.all())
last_evaluated_id = tasks[-1].id
if len(tasks) < 10:
break
print(tasks)
Don't worry, we can still support such a query in DynamoDB as well. We just need to add to use a global secondary index (GSI). We can create a table with single GSI named GS1 like this:
import boto3
client = boto3.client("dynamodb", endpoint_url="http://localhost:9999")
client.create_table(
TableName="tasks-api",
KeySchema=[
{"AttributeName": "PK", "KeyType": "HASH"},
{"AttributeName": "SK", "KeyType": "RANGE"},
],
AttributeDefinitions=[
{"AttributeName": "PK", "AttributeType": "S"},
{"AttributeName": "SK", "AttributeType": "S"},
{"AttributeName": "GS1PK", "AttributeType": "S"},
{"AttributeName": "GS1SK", "AttributeType": "S"},
],
GlobalSecondaryIndexes=[
{
"IndexName": "GS1",
"KeySchema": [
{"AttributeName": "GS1PK", "KeyType": "HASH"},
{"AttributeName": "GS1SK", "KeyType": "RANGE"},
],
"Projection": {"ProjectionType": "ALL"},
},
],
BillingMode="PAY_PER_REQUEST",
)
Like a table, GSI has its own HASH and RANGE keys. We can then query by status like this:
import boto3
from boto3.dynamodb.conditions import Key
status = "OPEN"
dynamodb = boto3.resource("dynamodb", endpoint_url="http://localhost:9999")
table = dynamodb.Table("tasks-api")
last_key = None
query_kwargs = {
"KeyConditionExpression": Key("GS1PK").eq(f"#TASK#{status}"),
"IndexName": "GS1",
}
tasks = []
while True:
if last_key is not None:
query_kwargs["ExclusiveStartKey"] = last_key
response = table.query(**query_kwargs)
tasks.extend(response["Items"])
last_key = response.get("LastEvaluatedKey")
if last_key is None:
break
print(tasks)
As you can see, you need to explicitly define an index that you want to use when querying. On the other hand, Postgres does that by itself. Once again, if you compare queries, you'll see they aren't that different after all.
As long as you're using exact queries (=, !=, >, <, ...), there's no major difference between Postgres and DynamoDB. With DynamoDB you can't LIKE, IN, etc. queries or full-text search. To support things like that you need to pair it with Elasticsearch or similar.
You can see a full list of supported conditions for HASH/RANGE keys here.
In short, both databases should support your needs when used as an application database. If you want to do more complex queries (e.g., reporting, analytics, ...), you'll either need to use Postgres or pair DynamoDB with some other service like Elasticsearch, Athena, or Redshift, to name a few.
Performance
Performance is always a tricky topic. It's very easy to go into "opinions" territory. I'll try to avoid that. If you search the web, you'll find people praising DynamoDB and trashing Postgres -- and vice versa. The reality is that in most of the cases you'll see that one of the following flaws was made:
- DynamoDB was used as a relational database -- multiple tables, no single table design
- Postgres was poorly indexed/partitioned
- Scan was used with DynamoDB
- Postgres was queried by non-indexed columns or with unlimited queries
I've used both of them in various production environments throughout the years. I can tell you the following: When used properly, they are both blazing fast. You might notice subtle differences between them when compared directly, but from your application's perspective, you won't notice much of a performance difference. Sure, at the Amazon/Google/Netflix/etc. scale, you will obviously see some difference; after all, there's a reason DynamoDB was created in the first place.
Long story short: For most projects out there, performance shouldn't be the main deciding factor. At least if you're not Amazon/Netflix/.... Instead, spend your time on developing an efficient data (access) model and you'll be fine.
Backups
One of the often forgotten topics are backups. It's all good and fine without backups until it's not. Since we're already inside AWS with DynamoDB, let's compare Postgres on AWS RDS with DynamoDB.
They both support on-demand and continuous backups. They both offer point-in-time restore, which I highly recommend. I strongly suggest that you use builtin backups for both of them. With point-in-time restore the flow is pretty much the same for both:
- Create a point-in-time restore from a given timestamp, which will create a new instance (RDS) or a new table (DynamoDB)
- Copy missing data from restored data to app instance/table OR reroute your app to the new instance/table
Tip: Backups on AWS are compliant with certificates like SOC 2.
You can learn more from the official docs for DynamoDB and RDS.
Transactions
Before we conclude, let's talk a bit about transactions. A transaction is a mechanism that allows us to encapsulate multiple operations into a single database operation. This helps ensure that either all operations are executed or none of them. For example, this way we can make sure that we decrease the number of items in stock and create a new order. Without transactions, we might end up with an order without items in stock -- e.g., someone else placed an order just before you.
Transactions are "bread and butter" for SQL databases. On the other hand, they aren't that common in NoSQL databases. Both, Postgres and DynamoDB, support transactions -- all or nothing. They might feel more idiomatic with Postgres, but you can use them with DynamoDB as well. You'll just need to use the lower-level Boto3 APIs.
Therefore, you should be fine in this regard with either DynamoDB or Postgres.
You can learn more from the official docs for DynamoDB and Postgres.
Which One Should I Use?
All of this leaves you with the question: Which One Should I Use? For most of the cases, you can use either one of them. Nevertheless, there are some guidelines that can help you choose:
- If you're using Django (or any other batteries-included framework), go with Postgres. It's the easiest way to go. You'll just fight with the framework if you try to use DynamoDB.
- If you have an inexperienced team, go with Postgres. It's easier to wrap your head around it and fix your mistakes later on.
- If you need to easily replicate data across multiple AWS regions, go with DynamoDB. It's built for that.
- If you have very spiky traffic, go with DynamoDB. You can pay per request.
- If you want to be serverless, go with DynamoDB. But in that case, you'll want to run your application on Lambda as well.
- If you're NOT inside the AWS ecosystem, go with Postgres. DynamoDB is tightly coupled with AWS.
Whichever you choose, bear in mind that it's much more important to have an efficient data model than the database you're using. DynamoDB will force you into that. Postgres won't. But as you've seen, when efficiently using them, you'll end up with pretty similar usage patterns.
Conclusion
While there's a lot covered in this article, there's still a lot to explore. So what to do next? You can build the same app with both of them to really get a good grasp. We -- surprise, surprise! -- have two courses that can help with that:
The former uses DynamoDB while the latter uses Postgres. Try to build the same thing with both and you'll see how it goes. If nothing else, I'm sure you'll be way better in data modeling once done.
You can also check the GitHub repository with examples from this article.
Happy Pythoning!