Introduction to JumpStart - Text Generation

Set Up
Select a model
Retrieve JumpStart Artifacts & Deploy an Endpoint
Query endpoint and parse response
Clean up the endpoint

Note: This notebook was tested on ml.t3.medium instance in Amazon SageMaker Studio with Python 3 (Data Science) kernel and in Amazon SageMaker Notebook instance with conda_python3 kernel.

1. Set Up

[ ]:

!pip install sagemaker ipywidgets --upgrade --quiet

Permissions and environment variables

[ ]:

import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = get_execution_role()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

2. Select a model

Here, we download jumpstart model_manifest file from the jumpstart s3 bucket, filter-out all the Text Generation models and select a model for inference. ***

[ ]:

from ipywidgets import Dropdown

# download JumpStart model_manifest file.
boto3.client("s3").download_file(
    f"jumpstart-cache-prod-{aws_region}", "models_manifest.json", "models_manifest.json"
)
with open("models_manifest.json", "rb") as json_file:
    model_list = json.load(json_file)

# filter-out all the Text Generation models from the manifest list.
text_generation_models = []
for model in model_list:
    model_id = model["model_id"]
    if "-textgeneration-" in model_id and model_id not in text_generation_models:
        text_generation_models.append(model_id)

# display the model-ids in a dropdown to select a model for inference.
model_dropdown = Dropdown(
    options=text_generation_models,
    value="huggingface-textgeneration-gpt2",
    description="Select a model",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)

Chose a model for Inference

[ ]:

display(model_dropdown)

[ ]:

# model_version="*" fetches the latest version of the model
model_id, model_version = model_dropdown.value, "*"

3. Retrieve JumpStart Artifacts & Deploy an Endpoint

Using JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the deploy_image_uri, deploy_source_uri, and model_uri for the pre-trained model. To host the pre-trained model, we create an instance of `sagemaker.model.Model <https://sagemaker.readthedocs.io/en/stable/api/inference/model.html>`__ and deploy it. This may take a few minutes.

[ ]:

from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


endpoint_name = name_from_base(f"jumpstart-example-infer-{model_id}")

inference_instance_type = "ml.p2.xlarge"

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)


# Retrieve the model uri. This includes the pre-trained nvidia-ssd model and parameters.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)


# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=model_uri,
    entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

4. Query endpoint and parse response

[ ]:

def query(model_predictor, text):
    """Query the model predictor."""

    encoded_text = json.dumps(text).encode("utf-8")

    query_response = model_predictor.predict(
        encoded_text,
        {
            "ContentType": "application/x-text",
            "Accept": "application/json",
        },
    )
    return query_response


def parse_response(query_response):
    """Parse response and return the generated text."""

    model_predictions = json.loads(query_response)
    generated_text = model_predictions["generated_text"]
    return generated_text

[ ]:

newline, bold, unbold = "\n", "\033[1m", "\033[0m"

text1 = "As far as I am concerned, I will"
text2 = "The movie is"

for text in [text1, text2]:
    query_response = query(model_predictor, text)
    generated_text = parse_response(query_response)
    print(f"Input text: {text}{newline}" f"Generated text: {bold}{generated_text}{unbold}{newline}")

5. Clean up the endpoint

[ ]:

# Delete the SageMaker endpoint
model_predictor.delete_model()
model_predictor.delete_endpoint()