Deploying a Machine Learning Model to AWS Lambda

Last updated August 21st, 2023

In this tutorial, we'll take a look at how to deploy a machine learning (ML) model to AWS Lambda, via Serverless Framework, and execute it using Boto3. We'll also create a CI/CD pipeline with GitHub Actions to automate the deployment process and run end-to-end tests.

Contents

Project Setup

Playing around with ML models on your local machine can be fun. Nevertheless, at the end of the day, you need to deploy it somewhere to serve its purpose in production. While there are a number of different deployment options available, AWS Lambda is a great choice for when you have spiky traffic or when it's rarely used. Plus, AWS Lambda makes it easy to perform ML operations in parallel. You just call your Lambda function with Boto3 from within your code that needs ML functionality. With that, let's build an ML service for performing sentiment analysis.

Start by creating a new project:

$ mkdir ml_lambda && cd ml_lambda && git init
$ python3.11 -m venv venv
$ source venv/bin/activate
(venv)$

Create a new .gitignore file. Then, add the contents from the Python .gitignore template found here. Add the .gitignore to git and commit:

$ git add .gitignore
$ git commit -m 'Initial commit'

Second, add PyTorch and Transformers to a requirements.txt file:

torch==2.0.1
transformers==4.31.0

Install them:

(venv)$ pip install -r requirements.txt

Third, create a new module called sentiment_model.py:

from transformers import pipeline


class SentimentModel:
    def __init__(self):
        self._sentiment_analysis = pipeline("sentiment-analysis",model="ProsusAI/finbert")

    def predict(self, text):
        return self._sentiment_analysis(text)[0]["label"]

Here, we used the FinBERT pre-trained natural language processing (NLP) model, which is designed to predict sentiment on financial text.

Want to test it out locally? Add the following to the bottom of the module:

if __name__ == "__main__":
    sample_text = "The Dow Jones Industrial Average (^DJI) turned green."

    model = SentimentModel()
    sentiment = model.predict(text=sample_text)
    print(sentiment)

Then, execute the module. Try some different sample text. Make sure to remove the if block once done.

Finally, create a new project on GitHub and update the git remote.

AWS Lambda Handler

AWS Lambda is a serverless compute service where you can run your code. Instead of traditional deployment offerings, where you "rent" space on a server, you only pay for actual execution time. Therefore, it's a great choice when you have little traffic or spikes in traffic from time to time.

Running Python code on AWS Lambda is fairly simple. You need to specify a handler function and point Lambda to it. Here's what the expected signature needs to look like for the handler function:

def handle(event, context):
    ...

Create a new module called handler.py:

import logging

from sentiment_model import SentimentModel


LOGGER = logging.getLogger()
LOGGER.setLevel(logging.INFO)

model = SentimentModel()


def handle(event, context):
    if event.get("source") == "KEEP_LAMBDA_WARM":
        LOGGER.info("No ML work to do. Just staying warm...")
        return "Keeping Lambda warm"

    return {
        "sentiment": model.predict(text=event["text"])
    }

So what's going on here?

First, outside of the handle function, we configured the logger and initialized our model. Code written outside of the handle function is executed only during Lambda's cold starts. For cold starts, Lambda has to:

  1. Find space on an EC2 instance
  2. Initialize the execution environment
  3. Initialize the module

Cold starts are needed after re-deployments and when the Lambda isn't used for a while. AWS hasn't disclosed the exact amount of time during which a Lambda will stay warm. Nevertheless, if you search the web, you'll find times between five and fifteen minutes. Since cold starting a Lambda takes a few seconds, you'll often want to keep your Lambda in a warm state. We'll actually use a separate Lambda function for this, which will periodically ping our Lambda to keep it warm. Since you, again, pay for execution time, we don't want to execute the full ML model when we're just trying to keep it warm, so we added the following if block:

if event.get("source") == "KEEP_LAMBDA_WARM":
    LOGGER.info("No ML work to do. Just staying warm...")
    return "Keeping Lambda warm"

To learn more about cold starts, review Lambda execution environments from the docs.

Serverless Framework

To deploy our Lambda function, we'll use Serverless Framework, which helps you develop and deploy serverless applications with ease. It's simple to use for our use case. We need to create a serverless.yml configuration file that tells Serverless Framework which serverless cloud resources to create and how to invoke our application running on them. The framework supports multiple cloud vendors, like AWS, Google Cloud, and Azure, to name a few.

It's highly recommended that you go through Your First Serverless Framework Project before continuing on with this one so you have some context on how Serverless Framework works.

Add a new serverless.yml configuration file:

service: ml-model

frameworkVersion: '3'
useDotenv: true

provider:
  name: aws
  region: ${opt:region, 'eu-west-1'}
  stage: ${opt:stage, 'development'}
  logRetentionInDays: 30
  ecr:
    images:
      appimage:
        path: ./

functions:
  ml_model:
    image:
      name: appimage
    timeout: 90
    memorySize: 4096
    environment:
      TORCH_HOME: /tmp/.ml_cache
      TRANSFORMERS_CACHE: /tmp/.ml_cache/huggingface

custom:
  warmup:
    MLModelWarmer:
      enabled: true
      events:
        - schedule: rate(4 minutes)
      concurrency: ${env:WARMER_CONCURRENCY, 2}
      verbose: false
      timeout: 100
      payload:
        source: KEEP_LAMBDA_WARM

plugins:
  - serverless-plugin-warmup

There are quite a few things going on inside this file.

First, we defined the name of our service:

service: ml-model

Second, we defined some global configuration:

frameworkVersion: '3'

frameworkVersion is used to pin a specific version of Serverless Framework.

Third, we configured a provider:

provider:
  name: aws
  region: ${opt:region, 'eu-west-1'}
  stage: ${opt:stage, 'development'}
  logRetentionInDays: 30
  ecr:
    images:
      appimage:
        path: ./

We set defaults for region and stage, Ireland (eu-west-1) and development respectively. opt:region and opt:stage can be read in from the command line. For example:

$ serverless deploy --region us-east-1 --stage production

Next, we set the log retention days so that all logs inside CloudWatch from our application will be deleted after thirty days. At the end, we defined that we'll use a Docker image to deploy our ML model. It will be built on path ./ and stored inside ECR.

Fourth, we defined our Lambda function:

functions:
  ml_model:
    image:
      name: appimage
    timeout: 90
    memorySize: 4096
    environment:
      TORCH_HOME: /tmp/.ml_cache
      TRANSFORMERS_CACHE: /tmp/.ml_cache/huggingface

Since we're using the AWS provider, all defined functions will be AWS Lambda functions.

We specified which Docker image will be used, which will be pushed to ECR and downloaded from there. Then we set timeout and memory limits. We finished the function's definition by setting writable folders for the PyTorch and HuggingFace libraries. These libraries will download some files. Therefore, the destination must be writable. We used a "/tmp" folder for that, which is the only folder that's writable on AWS Lambda.

With ML libraries like PyTorch and HuggingFace's transformers, you can use pre-trained models (which we're doing). These models are downloaded from the Internet and stored on disk -- to cache them. This way you don't need to download them every time the code gets executed. Libraries always check whether they can load the model from the local cache first. If it's not there, they download it. They use default settings for the cache location. On Lambda, the default location is not writable. Only the "/tmp" folder is writable. Therefore, we need to set the cache location inside "/tmp". To clearly indicate that the destination is the cache destination for the ML models, we named it ".ml_cache".

For more on customizing PyTorch and HuggingFace's cache, review the following resources:

  1. PyTorch: Where are my downloaded models saved?
  2. HuggingFace: Caching models

By setting environment variables TORCH_HOME and TRANSFORMERS_CACHE, we told our libraries to use this location for caching. Feel free to rename the destination to your liking.

We're using 90 seconds for timeout because the model needs to be loaded and this can take some time.

Fifth, we specified configuration for our Lambda's warmer, which is just another Lambda, to keep our core Lambda warm:

custom:
  warmup:
    MLModelWarmer:
      enabled: true
      name: ${self:service}-${self:provider.stage}-warmer
      roleName: ${self:service}-${self:provider.stage}-warmer-role
      events:
        - schedule: rate(4 minutes)
      concurrency: ${env:WARMER_CONCURRENCY, 2}
      verbose: false
      timeout: 100
      payload:
        source: KEEP_LAMBDA_WARM
plugins:
  - serverless-plugin-warmup

Here, we used the serverless-plugin-warmup plugin. Take note of the payload:

{ "source": "KEEP_LAMBDA_WARM" }

Add the following package.json file as well to install serverless-plugin-warmup:

{
  "name": "ml-model",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "dependencies": {},
  "devDependencies": {
    "serverless-plugin-warmup": "8.2.1"
  },
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC"
}

Dockerfile

The only thing left before deployment is our Dockerfile:

###########
# BUILDER #
###########
FROM public.ecr.aws/lambda/python:3.10 as builder

RUN pip3 install --upgrade pip

COPY requirements.txt  .
RUN  pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"

#########
# FINAL #
#########
FROM public.ecr.aws/lambda/python:3.10
RUN pip3 install --upgrade pip

COPY --from=builder ${LAMBDA_TASK_ROOT} ${LAMBDA_TASK_ROOT}

COPY . ${LAMBDA_TASK_ROOT}

CMD [ "handler.handle" ]

So, we used a base image for Lambda provided by AWS. The image is built in two stages to keep the final image size minimal. At the end, we specified which function will be called when Lambda is invoked.

Your project directory should now look like this:

├── Dockerfile
├── handler.py
├── package.json
├── requirements.txt
├── sentiment_model.py
└── serverless.yml

Deployment with GitHub Actions

Since we're true professionals, we'll use GitHub Actions to deploy our Lambda function. To do that we first need to add a CI/CD configuration.

Create the following file and folders:

└── .github
    └── workflows
        └── lambda.yml

Then, add the CI/CD configuration to .github/workflows/lambda.yml:

name: ML Lambda Deploy
on:
  push:

env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  AWS_DEFAULT_REGION: eu-west-1

jobs:
  deploy-development:
    strategy:
      fail-fast: false
      matrix:
        python-version: ['3.10']
        node-version: [18]
        os: [ubuntu-latest]
    runs-on: ${{ matrix.os }}
    steps:
      - name: Checkout Code
        uses: actions/checkout@v3
      - name: Set up Python 3.10
        uses: actions/setup-python@v3
        with:
          python-version: ${{ matrix.python-version }}
      - uses: actions/setup-node@v3
        with:
          node-version: ${{ matrix.node-version }}
      - name: Install Serverless Framework
        run: npm install -g serverless
      - name: Install NPM dependencies
        run: npm install
      - name: Deploy
        run: sls deploy --stage development --verbose

Inside the newly added job, deploy-development, we:

  1. Prepared the basic environment by installing Python and Node.
  2. Installed Serverless Framework as a global dependency -- so we're able to run sls commands.
  3. Installed dependencies from package.json -- at this point, it's just the serverless-plugin-warmup plugin.
  4. Deployed our application by running the sls deploy --stage development --verbose command.

You can try to deploy directly from your computer to AWS by running npm install and serverless deploy --stage development inside "services/tasks_api". You need Serverless Framework installed as well as npm and Node.js. You also need to set your AWS credentials.

AWS Credentials

Before you can commit and push, you'll need to add AWS credentials to GitHub.

If you don't have an AWS account yet, create one from the AWS console.

First, open the IAM console. Then, click "Add user". Enter github for the user name. Click "Next".

Add GitHub user

On the next step, select "Attach policies directly" and select "AdministratorAccess".

Select IAM permissions

Click "Next" and then "Create user". Once your user is created (you'll see it in the list of users), click on it to open its details. Once you're there, switch to the "Security credentials" tab and click on "Create access key".

IAM user details

After that, select "Application running outside of AWS" and click "Next". On next screen, click on "Create access key".

Create IAM access key

After you have credentials for your github IAM user, you need to add them to GitHub. To do that go to your repository and click on "Settings" -> "Secrets and variables -> Actions". Next, click on "New repository secret". Create a secret for AWS_ACCESS_KEY_ID. Then, create another one for AWS_SECRET_ACCESS_KEY. Make sure to set the values to the values of the credentials that you just created.

Commit and push your code.

Check the pipeline's output to make sure it was successful. Extend the Deploy steps inside the deploy-development job's output to see the details.

It can take several minutes for the deployment to succeed.

End-to-End Test

Finally, let's add an end-to-end test for our Lambda. This way we'll make sure that our Lambda is working as expected. Also, it'll provide an example of its usage.

First, add Boto3 and pytest to requirements.txt:

boto3==1.28.25
pytest==7.4.0
torch==2.0.1
transformers==4.31.0

Second, create a new module called tests.py:

import json

import boto3


def test_sentiment_is_predicted():
    client = boto3.client('lambda')
    response = client.invoke(
        FunctionName='ml-model-development-ml_model',
        InvocationType='RequestResponse',
        Payload=json.dumps({
            "text": "I am so happy! I love this tutorial! You really did a great job!"
        })
    )

    assert json.loads(response['Payload'].read().decode('utf-8'))["sentiment"] == "positive"

Here, we used Boto3 to invoke our Lambda function. We passed the text -- a very positive one -- to predict the sentiment for as a payload Then, we asserted that the sentiment is positive.

Feel free to run your tests locally via python -m pytest tests.py. Just make sure to install the new dependencies. You'll need to have your AWS credentials set up.

Add the end-to-end test job to .github/workflows/lambda.yml

name: ML Lambda Deploy
on:
  push:

env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  AWS_DEFAULT_REGION: eu-west-1
  APP_ENVIRONMENT: development  # new

jobs:
  deploy-development:
    strategy:
      fail-fast: false
      matrix:
        python-version: ['3.10']
        node-version: [18]
        os: [ubuntu-latest]
    runs-on: ${{ matrix.os }}
    steps:
      - name: Checkout Code
        uses: actions/checkout@v3
      - name: Set up Python 3.10
        uses: actions/setup-python@v3
        with:
          python-version: ${{ matrix.python-version }}
      - uses: actions/setup-node@v3
        with:
          node-version: ${{ matrix.node-version }}
      - name: Install Serverless Framework
        run: npm install -g serverless
      - name: Install NPM dependencies
        run: npm install
      - name: Deploy
        run: sls deploy --stage development --verbose
  e2e:  # new
    needs: [deploy-development]
    strategy:
      fail-fast: false
      matrix:
        python-version: [ '3.10' ]
        os: [ ubuntu-latest ]
    runs-on: ${{ matrix.os }}
    steps:
      - name: Checkout Code
        uses: actions/checkout@v3
      - name: Set up Python 3.10
        uses: actions/setup-python@v3
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install Dependencies
        run: pip install -r requirements.txt
      - name: Run pytest
        run: pytest tests.py

That's it! Commit and push your code. You should see the pipeline running. It should succeed.

Congrats! Now you have a fully automated CI/CD pipeline that deploys your Lambda function serving ML model.

You can destroy AWS resources by replacing sls deploy --stage development --verbose with sls remove --stage development --verbose inside the deploy-development job.

Conclusion

In this tutorial, you learned how to deploy a ML model to AWS Lambda and how to consume it with the Boto3 library. You also learned how to create a CI/CD pipeline that deploys your Lambda function automatically and runs end-to-end tests for it. You used Serverless Framework to deploy your Lambda function and GitHub Actions to create a CI/CD pipeline.

Jan Giacomelli

Jan Giacomelli

Jan is a software engineer who lives in Ljubljana, Slovenia, Europe. He is a Staff Software Engineer at ren.co where he is leading backend engineering efforts. He loves Python, FastAPI, and Test-Driven Development. When he's not writing code, deploying to AWS, or speaking at a conference, he's probably skiing, windsurfing, or playing guitar. Currently, he's working on his new course Complete Python Testing Guide.

Share this tutorial

Featured Course

Serverless Apps with FastAPI, DynamoDB, and Vue

Build serverless applications with FastAPI and Vue that run on AWS.

Featured Course

Serverless Apps with FastAPI, DynamoDB, and Vue

Build serverless applications with FastAPI and Vue that run on AWS.