[14]:
# install dependencies
!pip install opencv-python
!pip install tensorflow-hub
!apt-get update
!apt-get install ffmpeg libsm6 libxext6  -y

Using the SageMaker TensorFlow Serving Container

The SageMaker TensorFlow Serving Container makes it easy to deploy trained TensorFlow models to a SageMaker Endpoint without the need for any custom model loading or inference code.

In this example, we will show how deploy one or more pre-trained models from TensorFlow Hub to a SageMaker Endpoint using the SageMaker Python SDK, and then use the model(s) to perform inference requests.

Next, we’ll get the IAM execution role from our notebook environment, so that SageMaker can access resources in your AWS account later in the example.

[1]:
from sagemaker import get_execution_role

sagemaker_role = get_execution_role()

Download and prepare a model from TensorFlow Hub

The TensorFlow Serving Container works with any model stored in TensorFlow’s SavedModel format. This could be the output of your own training job or a model trained elsewhere. For this example, we will use a pre-trained version of the MobileNet V2 image classification model from TensorFlow Hub.

The TensorFlow Hub models are pre-trained, but do not include a serving signature_def, so we’ll need to load the model into a TensorFlow session, define the input and output layers, and export it as a SavedModel. There is a helper function in this notebook’s sample_utils.py module that will do that for us.

[2]:
import sample_utils

model_name = "mobilenet_v2_140_224"
export_path = "mobilenet"
model_path = sample_utils.tfhub_to_savedmodel(model_name, export_path)

print("SavedModel exported to {}".format(model_path))

After exporting the model, we can inspect it using TensorFlow’s saved_model_cli command. In the command output, you should see

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
...

The command output should also show details of the model inputs and outputs.

[3]:
!saved_model_cli show --all --dir {model_path}

Optional: add a second model

The TensorFlow Serving container can host multiple models, if they are packaged in the same model archive file. Let’s prepare a second version of the MobileNet model so we can demonstrate this. The mobilenet_v2_035_224 model is a shallower version of MobileNetV2 that trades accuracy for smaller model size and faster computation, but has the same inputs and outputs.

[4]:
second_model_name = "mobilenet_v2_035_224"
second_model_path = sample_utils.tfhub_to_savedmodel(second_model_name, export_path)

print("SavedModel exported to {}".format(second_model_path))

Next we need to create a model archive file containing the exported model.

Create a model archive file

SageMaker models need to be packaged in .tar.gz files. When your endpoint is provisioned, the files in the archive will be extracted and put in /opt/ml/model/ on the endpoint.

[5]:
!tar -C "$PWD" -czf mobilenet.tar.gz mobilenet/

Upload the model archive file to S3

We now have a suitable model archive ready in our notebook. We need to upload it to S3 before we can create a SageMaker Model that. We’ll use the SageMaker Python SDK to handle the upload.

[6]:
from sagemaker.session import Session

model_data = Session().upload_data(path="mobilenet.tar.gz", key_prefix="model")
print("model uploaded to: {}".format(model_data))

Create a SageMaker Model and Endpoint

Now that the model archive is in S3, we can create a Model and deploy it to an Endpoint with a few lines of python code:

[8]:
from sagemaker.tensorflow.model import TensorFlowModel

# Use an env argument to set the name of the default model.
# This is optional, but recommended when you deploy multiple models
# so that requests that don't include a model name are sent to a
# predictable model.
env = {"SAGEMAKER_TFS_DEFAULT_MODEL_NAME": "mobilenet_v2_140_224"}

model = TensorFlowModel(model_data=model_data, role=sagemaker_role, framework_version="1.15.2", env=env)
predictor = model.deploy(initial_instance_count=1, instance_type="ml.c5.xlarge")

Make predictions using the endpoint

The endpoint is now up and running, and ready to handle inference requests. The deploy call above returned a predictor object. The predict method of this object handles sending requests to the endpoint. It also automatically handles JSON serialization of our input arguments, and JSON deserialization of the prediction results.

We’ll use these sample images:

25ce5982b49a45ea88c47cf16a162d4c c1f6252cb9544dea8c2ade2a424abd30

[9]:
# read the image files into a tensor (numpy array)
kitten_image = sample_utils.image_file_to_tensor("kitten.jpg")

# get a prediction from the endpoint
# the image input is automatically converted to a JSON request.
# the JSON response from the endpoint is returned as a python dict
result = predictor.predict(kitten_image)

# show the raw result
print(result)

Add class labels and show formatted results

The sample_utils module includes functions that can add Imagenet class labels to our results and print formatted output. Let’s use them to get a better sense of how well our model worked on the input image.

[10]:
# add class labels to the predicted result
sample_utils.add_imagenet_labels(result)

# show the probabilities and labels for the top predictions
sample_utils.print_probabilities_and_labels(result)

Optional: make predictions using the second model

If you added the second model (mobilenet_v2_035_224) in the previous optional step, then you can also send prediction requests to that model. To do that, we’ll need to create a new predictor object.

Note: if you are using local mode (by changing the instance type to local or local_gpu), you’ll need to create the new predictor this way instead:

predictor2 = TensorFlowPredictor(predictor.endpoint_name, model_name='mobilenet_v2_035_224',
                       sagemaker_session=predictor.sagemaker_session)
[11]:
from sagemaker.tensorflow.model import TensorFlowPredictor

# use values from the default predictor to set up the new one
predictor2 = TensorFlowPredictor(predictor.endpoint_name, model_name="mobilenet_v2_035_224")

# make a new prediction
bee_image = sample_utils.image_file_to_tensor("bee.jpg")
result = predictor2.predict(bee_image)

# show the formatted result
sample_utils.add_imagenet_labels(result)
sample_utils.print_probabilities_and_labels(result)

Additional Information

The TensorFlow Serving Container supports additional features not covered in this notebook, including support for:

  • TensorFlow Serving REST API requests, including classify and regress requests

  • CSV input

  • Other JSON formats

For information on how to use these features, refer to the documentation in the SageMaker Python SDK.

Cleaning up

To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint.

[12]:
predictor.delete_endpoint()