Building Serverless Machine Learning APIs with AWS Chalice

Building Serverless Machine Learning APIs with AWS Chalice


15 min read

In my Previous Post, I discussed AWS Amplify, which allows us to develop and deploy Serverless Applications using AWS Resources without actually configuring them. On further discovery, I came across a number of services that allow for the development and deployment of Services, with a high-level abstraction, pretty quickly without actually needing to configure a lot of details.

One of the most popular AWS services that allow us to get started with Serverless Computing is AWS Lambda, which simply allows us to leverage Function as a Service (FaaS). As a Developer, the advantages that come with this are huge. We DO have Servers up and running, but we don't need to provision and maintain them. Our duty solely relies on providing the Function (Code) to the Servers with an added advantage of paying only for the compute time used.

Under the hood, AWS Lambda makes use of a whole lot of services to make this happen, and with flexible scaling and high availability, it can be the go-to service for Serverless Deployment. AWS Chalice is a Micro-Web Framework (Similar to Flask), that allows for development and deployment of API-based Microservices pretty easy compared to anything else. Except for the Code Part, the scaling, packaging and deployment are all done for us almost instantaneously.

In this article, we will be taking a look at how we can develop and deploy an API powered via Machine Learning Models using AWS Chalice along with exploring the various advantages (and disadvantages) of the same and how easy it is to work with the same.


We will be taking a look at these parts while developing and deploying a simple Machine Learning Microservice using AWS Chalice:

What is AWS Chalice?

Have you ever made use of a Micro-Web Framework like Flask to develop and deploy APIs? Chalice is a similar Web Framework, with a lot of features and syntax quite similar to that of Flask, albeit it runs completely on AWS Resources and offers a single-click deployment. The best part about working with Chalice is that we get all the functionality that we can expect from an API Development Framework and it provides integrated functionality with most of the AWS Toolings like S3 Storage, Simple Queue Service, API Gateway and more.

Besides that, Chalice comes with a handy Command Line Interface (CLI) tool, which can be installed via pip, that allows us to set a Chalice Project automatically, test it locally and deploy it as well. With Chalice, we can only focus on writing the Code that matters while leaving the deployment and management part to AWS Resources to take care of.

To set a Chalice Project, we need to configure our AWS CLI which will provide us with the Interface to communicate with AWS Resources, we are trying to work on. Once we install the AWS CLI, we can go over to the next step and configure our AWS Credentials.

$ aws configure
AWS Access Key ID [****************XIM2]:
AWS Secret Access Key [****************qsQR]:
Default region name [us-east-1]:
Default output format [None]:

We are now all set-up to create our first Chalice Project and deploy the same just via Command Line Interface.


Socrates accepting a Poisoned Chalice with Hemlock

Setting a Chalice Project

Now that we have configured the AWS CLI, let's move ahead and create a Chalice Project. Before we do that, let us first configure a Project Directory and set a Virtual Environment for the same:

$ mkdir machine-learning-app
$ cd machine-learning-app

Now that we have created a directory, let's move ahead and initialize a Virtual Environment for the same:

$ virtualenv env
$ source venv/bin/activate
(env) $

With a Virtual Environment in place, we can isolate Third-Party Packages from the main installation and prevent conflicting versioning of Packages. This in turn makes Package Management easier. Let's move next and install Chalice via pip into our Virtual Environment:

(env) $ pip3 install chalice

Once the package has been installed, we are now all set to use the Chalice CLI to set up our New Project with ease. Let's push in this Command to get our Chalice App up and running:

(env) $ chalice new-project iris-app

This has now created a separate directory named iris-app in our Virtual Environment. Let's take a look inside the Project Structure:

    |   .gitignore
    |   requirements.txt

We have a .chalice directory which consists of our Configuration, along with the and requirements.txt file. If we open the requirements.txt file, we will not find anything there, because Chalice is not a part of the runtime environment. We will add our packages here as we install them for ease of use.

Let's take a look at our file:

from chalice import Chalice

app = Chalice(app_name='iris-app')

def index():
    return {'hello': 'world'}

It has only one Route that would assign the URL of the Application to the Function easily. The decorators here primarily "wrap" functions here which makes it easy to write Code Logic by breaking them down into separate routes. For now, our Application is serving only a JSON Message which is {'hello': 'world'}.

Deploying a Hello World API

Now that we already have a Hello, World API pre-configured in our Chalice Project, let's move ahead and run it locally. We will do that by pushing in a very simple Command:

(env) $ chalice local 
Serving on
Restarting local dev server.

This will automatically start the Local Development Server at Port 8000 while managing all the Complex Stuff by the aid of decorators. We can hit the Terminal and execute a curl request for the same:

$ curl -X GET http://localhost:8000
{'hello': 'world'}

So we have got our own Chalice Development Server up and running on a default Local Port 8000. Let us know deploy ourHello, World API to AWS Lambda by using the deploy command. This Command will simply invoke the creation of a Lambda Function that would be accessible via REST API Endpoint.

Before we follow this step, make sure to have our Credentials stored at ~/.aws/config for successful execution. If you don't have the Credentials to deploy your Service, you may take a look into setting up the AWS CLI and configure it to have your Credentials at the place.

Let us now push in the Command:

(env) $ chalice deploy
Creating deployment package.
Creating IAM role: iris-app-dev
Creating lambda function: iris-app-dev
Creating Rest API
Resources deployed:
  - Lambda ARN: arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:iris-app-dev
  - Rest API URL:

Yay! We now have our Chalice Application deployed on a Lambda Amazon Resource Name (ARN) along with a REST API URL which can be invoked and called to see how our Application is working. Let us pass a curl request via Terminal to the REST API URL and try it out:

$ curl -X GET
{'hello': 'world'}

Yeah! Our Application is deployed now and works perfectly fine! We may experiment with this now, add more Routes to the Application or make use of some external APIs (Github, Twilio) to see how we can perform a variety of operations on Chalice and re-deploy them.

In the next section, we will cover on creating a simple Regression Model and deploying that using Chalice.

Building a Regression Model

For building a Machine Learning API with Chalice, we will take the classic Dataset around Iris Flowers, to build a simple Machine Learning Model and deploy it. The Dataset contains tabular data about various characteristics of the flower examples like petal width and length, which are used as input to the model. The Output that we desire is quite simple: A List of Integers containing the probability of the Flower to be Iris Setosa, Iris Versicolour or Iris Virginica.

Let us take a deeper look into the Dataset that we have. We have almost 150 Records that contain three species of Flowers: Iris Setosa (Labelled as 0), Iris Versicolour (Labelled as 1) and Iris Virginica (Labelled as 2). They have the following attributes present in the Dataset:

  • Petal width
  • Petal length
  • Sepal width
  • Sepal length


The primary purpose of our Machine Learning Model would be to predict the Petal Width of the flower, given the Petal Length. For this purpose, we will make use of the Linear Regression Algorithm.

Linear Regression in Machine Learning is one of the most basic yet powerful Algorithms that is used to predict the outcome of a dependent variable using an independent variable by building a relationship between them. Linear Regression is commonly used to predict the House Sales Price or the outcome of certain events that follow fixed linearity.

In this example, we will make use of the Petal Length to predict the possible Petal Width of the Flower. Let's get our hands dirty with the Dataset.

Let us first install the necessary Libraries and Modules:

(env) $ pip3 install scikit-learn
(env) $ pip3 install numpy

Let us know import the necessary packages for our purpose:

import pickle
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LinearRegression

We will be making use of Scikit-Learn Machine Learning Library that provides a high-level abstraction to implement a variety of Algorithms. The Iris Dataset is also consumed from the Datasets Sub-Module provided by Scikit-Learn. Let us know, load the Data and create the Independent Variable (X) and Dependent Variable (Y):

iris = load_iris()
data = iris["data"]
X = data[:,2].reshape((-1,1))
y = data[:,3]

We have used List Slicing and the reshape function provided by Numpy to standardize our Dataset shape before we use the Linear Regression Function. We now have an Independent Variable (X) that consists of the Petal Lengths while we have Dependent Variable (Y) which consists of Petal Width.

Let us know use the LinearRegression Function to create a Model for the same:

model = LinearRegression().fit(X,y)
score = model.score(X,y)

We now have a Model up and ready to be pickled and deployed for our purpose. If we log the Score on the Console, it is nearly 0.9271098389904927 which fits best for experimental purposes, concerning the size of Data that we have.

Let us know pickle our Model and get ready for Deployment:

with open("model.pkl","wb") as f:

This will generate a Pickled Model of our Linear Regression Model, that is now trained on the Dataset that we desired and has now an appreciable amount of accuracy. Pickling basically allows us to serialize the object and save it for future use so that we don't have to repeat the operations all over again, and we can just the load the Model, deserialize it and make use of the same. We can see a model.pkl generated in our Directory which we can use in our Chalice Application next.

In the next section, we will finally discuss how we can make a RESTful Endpoint in our Chalice Application and get ready to deploy it.


Testing and Deploying our Application

And here we finally reach to the last Section of the Article where we will basically integrate our Model with the Chalice Application and test it before we re-deploy it on AWS. Before we actually write some Code and create endpoints for the purpose of getting the Input Parameters, and retrieving the Output, let us first add some new imports to our file:

from chalice import Chalice
import json
import pickle
import pathlib
import numpy as np

Let us now move next and create some Functions so that we can load the Model and format the Input. This would allow us to modularize the code and allow for proper Input to be passed after reshaping the Numpy Array:

def load_model(path):
    with"rb") as f:
        return pickle.load(f)

def format_input(model_input):
    assert model_input, "Empty model input"
    data = np.array(model_input).reshape((-1,1))
    assert len(data.shape) in (1,2), "Data should be either 1D or 2D"
    return data

In the load_model() function, we basically Load the Model for our purpose so that we can make use of it to generate predictions. In the format_input() function, we take the Input Data and reshape it using Numpy's reshape() function so that the Data passed to the Model is One-Dimensional.

Let us know make functions to predict the outcome, given the Model and the Data and to generate a response accordingly:

def predict(model,data):
    return model.predict(data).tolist()

def fmt_response(rtype,k,v):
    return {"type": rtype, k: v}

In the predict function, we use the Model that we loaded earlier to predict the outcome of them Data that we passed and convert that into a List before returning it. In the fmt_response() function, we pass the type of response (Success/Failure) and the Prediction that we have generated.

Let us know add a couple of more functions to generate a "Successful" Message and an "Error" Message accordingly:

def fmt_prediction(prediction):
    return fmt_response("success","prediction",prediction)

def fmt_error(err):
    return fmt_response("error","message",err)

We add a fmt_prediction() function to pass a Success Message along with the Prediction that we have generated. In fmt_error() function, we pass an Error Message along with the particular Error that we have retrieved.

Let us know, load the Model using the load_model() function that we developed earlier:

model_path = pathlib.Path(__file__).parent / "model.pkl"
model = load_model(model_path)

Now that we have everything setup, let's create another Route for the Application, to intake a POST Request and return the Prediction that has been generated by our Model:

@app.route("/iris", methods=['POST'])
def iris_predict():
    request_body = app.current_request.json_body
    data = request_body['data']
    data = json.loads(data)
    data = format_input(data)
        prediction = predict(model,data)
        return fmt_error("Error making prediction")

    return fmt_prediction(prediction)

Here we make use of the app.current_request.json_body to retrieve the JSON Input that would have been passed by the User as a Request. We will take the Data and pass it to a load() function to convert the String into a List and format it to get a Numpy Array. If we don't encounter any issues, we will pass it to the predict() function and get the desired output which would be passed as the prediction. Otherwise, an error message would be generated and now we would need to scratch our heads on finding the bug we have at this point.

Let us know save our File and take a look at how our Application is working locally. Push in the Command and make a curl request henceforth:

(env) $ chalice local

Let's make a curl request with a sample Data:

$ curl -H "Content-Type: application/json" -X POST -d '{"data": "[2, 1, 3, 5]"}' http://localhost:8000/iris

We will get the following output:

  "type": "success",
  "prediction": [

Here the list comprises of the possible Petal Width that has been generated given the possible Petal Lengths. Our Machine Learning Application seems to work perfectly, so let's go ahead and deploy our Application back again.

Let us know have a requirements.txt for our Project. Push in the Command to auto-generate a requirements file for your Project:

$ pip3 freeze > requirements.txt

Make sure to remove any dependencies that are related to Chalice in your Deployment. You can open the requirements.txt file and remove AWS-related dependencies to prevent any mishap in your Deployment Package.

Push in the Command, and our Deployment will be updated automatically:

$ chalice deploy
Creating deployment package.
Updating policy for IAM role: iris-app-dev
Updating lambda function: iris-app-dev
Updating rest API
Resources deployed:
  - Lambda ARN: arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:iris-app-dev
  - Rest API URL:

Let us pass a curl request via Terminal to the REST API URL and try it out:

$ curl -H "Content-Type: application/json" -X POST -d '{"data": "[2, 1, 3, 5]"}'

We will get the following output:

  "type": "success",
  "prediction": [

Our Machine Learning Application seems to work perfectly fine now on Deployment. We have now our first RESTful Service up and running on Chalice!

However, Chalice has one disadvantage: It is not the best-optimized Framework for deploying heavy Machine Learning and Deep Learning Application that make use of frameworks like Tensorflow and PyTorch. This is because AWS Lambda lays down a limitation of 50 MB and our Application's Deployment Packages shy away from it by a small margin.

To deploy Large-Scale Applications via Chalice, you can look around making use of chalice package workflow, that generates a SAM template which can be then deployed through CloudFormation. The aws cloudformation package will then upload the archive to S3, and hence can be deployed to Chalice.


In this article we covered up the basics of AWS Chalice Micro Web-Framework and how we can integrate the same with our Machine Learning Model and deploy it using the Chalice CLI on AWS Lambda to have a Serverless Code up and running, accessible via RESTful endpoint. We have also explored how Chalice makes it quite easy to develop API Services and deploy it using a single Command.

In the future Articles, I will be covering on how you can leverage upon Chalice to create a Full-Stack Application Service including setting a Web Application via Cognito, as well as testing it via PyTest and setting up CI/CD Pipeline using CodePipeline and CodeBuild.