Multi-Model Endpoint - CatBoost

This example notebook showcases how to use a custom container to host multiple CatBoost models on a SageMaker Multi Model Endpoint. The model this notebook deploys is taken from this CatBoost tutorial.

We are using this framework as an example to demonstrate deployment and serving using MultiModel Endpoint and showcase the capability. This notebook can be extended to any framework.

Catboost is gaining in popularity and is not yet supported as a framework for SageMaker MultiModelEndpoint. Further this example serves to demostrate how to bring your own container to a MultiModelEndpoint

In this Notebook we will use identical model to simulate multiple models for loading and inference

Prerequisites

Packages and Permissions

The SageMaker SDK uses the SageMaker default S3 bucket when needed. If the get_execution_role does not return a role with the appropriate permissions, you’ll need to specify an IAM role ARN that does. Please make sure the SageMakerFullAccess policy is attached to the execution role you are using.

Load model and test local inference

Here, install catboost to test we can load up the model locally and make inference.

We load up the model locally using CatBoostClassifier(). test_data.csv contains a single row of test inference data.

[ ]:

!pip install catboost

[ ]:

from catboost import CatBoostClassifier, Pool as CatboostPool, cv
import os
import pandas

model_file = CatBoostClassifier()
model_file = model_file.load_model("catboost_model.bin")
df = pandas.read_csv("test_data.csv")

[ ]:

import pandas as pd
import io
import json

out = model_file.predict_proba(df)
print(out)

Upload tar ball to s3

Create a model tar ball

SageMaker requires our model to be packaged in a tar.gz file.

[ ]:

! tar -czvf catboost-model.tar.gz catboost_model.bin

Upload 100 copies of the model to S3

Multi-Model Endpoints require all our models to be in a specific S3 prefix. Here we upload 100 of them to our default bucket.

[ ]:

import sagemaker

sess = sagemaker.Session()
s3_bucket = sess.default_bucket()  # Replace with your own bucket name if needed
print(s3_bucket)

Upload the model tar balls using boto3 with a unique name

[ ]:

import boto3

s3 = boto3.client("s3")
for i in range(0, 100):
    with open("catboost-model.tar.gz", "rb") as f:
        s3.upload_fileobj(f, s3_bucket, "catboost/catboost-model-{}.tar.gz".format(i))

List all models in s3 prefix we will use for our Multi-Model Endpoint

[ ]:

!aws s3 ls s3://$s3_bucket/catboost/

Building the custom container

The container folder in this example contains 3 files:

├── container
│   ├── dockerd-entrypoint.py
│   ├── Dockerfile
│   └── model_handler.py

dockerd-entrypoint.py is the entry point script that will start the multi model server.
Dockerfile contains the container definition that will be used to assemble the image. This includes the packages that need to be installed.
model_handler.py is the script that will contain the logic to load up the model and make inference.

Take a look through the files to see if there is any customization that you would like to do. Below cells highlight the main part of the files.

Install catboost in the `Dockerfile`

[ ]:

! sed -n '26,30p' container/Dockerfile

Update `initialize` function in `model_handler.py` with logic to load up the model

In this case we are using CatBoostClassifier(). Feel free to update the loading logic in this function to your needs.

[ ]:

! sed -n '22,40p' container/model_handler.py

Update `handle` function in `model_handler.py` with logic to load up the model

[ ]:

! sed -n '70,85p' container/model_handler.py

Build and Push the custom image to ECR

[ ]:

%%sh

# The name of our algorithm
algorithm_name=catboost-sagemaker-multimodel

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-east-1 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -q -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Deploy Multi Model Endpoint

[ ]:

from sagemaker import get_execution_role

sm_client = boto3.client(service_name="sagemaker")
runtime_sm_client = boto3.client(service_name="sagemaker-runtime")

account_id = boto3.client("sts").get_caller_identity()["Account"]
region = boto3.Session().region_name

role = get_execution_role()

Create the SageMaker Multi-Model

[ ]:

from time import gmtime, strftime

model_name = "catboost-multimodel-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = "s3://{}/catboost/".format(s3_bucket)  ## MODEL S3 URL
container = "{}.dkr.ecr.{}.amazonaws.com/catboost-sagemaker-multimodel:latest".format(
    account_id, region
)
instance_type = "ml.m5.xlarge"

print("Model name: " + model_name)
print("Model data Url: " + model_url)
print("Container image: " + container)

container = {"Image": container, "ModelDataUrl": model_url, "Mode": "MultiModel"}

create_model_response = sm_client.create_model(
    ModelName=model_name, ExecutionRoleArn=role, Containers=[container]
)

print("Model ARN: " + create_model_response["ModelArn"])

Create the SageMaker Endpoint Configuration

[ ]:

endpoint_config_name = "catboost-multimodel-config" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Endpoint config name: " + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": instance_type,
            "InitialInstanceCount": 1,
            "InitialVariantWeight": 1,
            "ModelName": model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

print("Endpoint config ARN: " + create_endpoint_config_response["EndpointConfigArn"])

Create the SageMaker Multi-Model Endpoint

[ ]:

%%time

import time

endpoint_name = "catboost-multimodel-endpoint-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Endpoint name: " + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Endpoint Status: " + status)

print("Waiting for {} endpoint to be in service...".format(endpoint_name))
waiter = sm_client.get_waiter("endpoint_in_service")
waiter.wait(EndpointName=endpoint_name)

Invoke each of the 100 models

We have identical models here to simulate multiple models belonging to the same framework

[ ]:

for i in range(0, 100):
    response = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        TargetModel="catboost-model-{}.tar.gz".format(i),
        Body=df.to_csv(index=False),
    )
    print(json.loads(response["Body"].read().decode("utf-8")))

Invoke just one of models 1000 times

Since the models are in memory and loaded, these invocations should not have any latency

[ ]:

import numpy as np

results = []
for i in range(0, 1000):
    start = time.time()
    response = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        TargetModel="catboost-model-1.tar.gz",
        Body=df.to_csv(index=False),
    )
    results.append((time.time() - start) * 1000)
print("\nPredictions for model latency: \n")
print("\nP95: " + str(np.percentile(results, 95)) + " ms\n")
print("P90: " + str(np.percentile(results, 90)) + " ms\n")
print("Average: " + str(np.average(results)) + " ms\n")

Optional Clean up

Clean up and delete the end point

[ ]:

# delete the end point

sm_client.delete_endpoint(EndpointName=endpoint_name)