NLP Online Explainability with SageMaker Clarify

Introduction

Amazon SageMaker Clarify helps improve your machine learning models by detecting potential bias and helping explain how these models make predictions. The fairness and explainability functionality provided by SageMaker Clarify takes a step towards enabling AWS customers to build trustworthy and understandable machine learning models.

SageMaker Clarify currently supports explainability for SageMaker models as an offline processing job. This example notebook showcases a new feature for explainability on a SageMaker real-time inference endpoint, a.k.a. online explainability.

This example notebook walks you through:

1. Key terms and concepts needed to understand SageMaker Clarify 1. Trained the model on the Women’s ecommerce clothing reviews dataset. 1. Create a model from trained model artifacts, create an endpoint configuration with the new SageMaker Clarify explainer configuration, and create an endpoint using the same explainer configuration. 1. Invoke the endpoint with single and batch request with different EnableExplanations query. 1. Explaining the importance of the various input features on the model’s decision.

General Setup

We recommend you use Python 3 (Data Science) kernel on SageMaker Studio or conda_python3 kernel on SageMaker Notebook Instance.

Install dependencies

The following packages are required by data preparation and training.

[ ]:

!pip install "datasets[s3]==1.6.2" "transformers==4.6.1" --upgrade

Upgrade the SageMaker Python SDK, and captum is used to visualize the feature attributions.

[ ]:

!pip install sagemaker --upgrade
!pip install boto3 --upgrade
!pip install botocore --upgrade

[ ]:

!pip install captum --upgrade

Import libraries

[ ]:

import boto3
import csv
import pandas as pd
import numpy as np
import pprint
import tarfile

from sagemaker.huggingface import HuggingFace
from datasets import Dataset
from datasets.filesystems import S3FileSystem
from captum.attr import visualization
from sklearn.model_selection import train_test_split
from sagemaker import get_execution_role, Session
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer
from sagemaker.utils import unique_name_from_base

Set configurations

[ ]:

boto3_session = boto3.session.Session()
sagemaker_client = boto3.client("sagemaker")
sagemaker_runtime_client = boto3.client("sagemaker-runtime")

# Initialize sagemaker session
sagemaker_session = Session(
    boto_session=boto3_session,
    sagemaker_client=sagemaker_client,
    sagemaker_runtime_client=sagemaker_runtime_client,
)

region = sagemaker_session.boto_region_name
print(f"Region: {region}")

role = get_execution_role()
print(f"Role: {role}")

prefix = unique_name_from_base("DEMO-NLP-Women-Clothing")

s3_bucket = sagemaker_session.default_bucket()
s3_prefix = f"sagemaker/{prefix}"
s3_key = f"s3://{s3_bucket}/{s3_prefix}"
print(f"Demo S3 key: {s3_key}")

model_name = f"{prefix}-model"
print(f"Demo model name: {model_name}")
endpoint_config_name = f"{prefix}-endpoint-config"
print(f"Demo endpoint config name: {endpoint_config_name}")
endpoint_name = f"{prefix}-endpoint"
print(f"Demo endpoint name: {endpoint_name}")

# SageMaker Clarify model directory name
model_path = "model/"

# Instance type for training and hosting
instance_type = "ml.m5.xlarge"

Create serializer and deserializer

CSV serializer to serialize test data to string

[ ]:

csv_serializer = CSVSerializer()

JSON deserializer to deserialize invoke endpoint response

[ ]:

json_deserializer = JSONDeserializer()

For visualization

[ ]:

# This method is a wrapper around the captum that helps produce visualizations for local explanations. It will
# visualize the attributions for the tokens with red or green colors for negative and positive attributions.
def visualization_record(
    attributions,  # list of attributions for the tokens
    text,  # list of tokens
    pred,  # the prediction value obtained from the endpoint
    delta,
    true_label,  # the true label from the dataset
    normalize=True,  # normalizes the attributions so that the max absolute value is 1. Yields stronger colors.
    max_frac_to_show=0.05,  # what fraction of tokens to highlight, set to 1 for all.
    match_to_pred=False,  # whether to limit highlights to red for negative predictions and green for positive ones.
    # By enabling `match_to_pred` you show what tokens contribute to a high/low prediction not those that oppose it.
):
    if normalize:
        attributions = attributions / max(max(attributions), max(-attributions))
    if max_frac_to_show is not None and max_frac_to_show < 1:
        num_show = int(max_frac_to_show * attributions.shape[0])
        sal = attributions
        if pred < 0.5:
            sal = -sal
        if not match_to_pred:
            sal = np.abs(sal)
        top_idxs = np.argsort(-sal)[:num_show]
        mask = np.zeros_like(attributions)
        mask[top_idxs] = 1
        attributions = attributions * mask
    return visualization.VisualizationDataRecord(
        attributions,
        pred,
        int(pred > 0.5),
        true_label,
        attributions.sum() > 0,
        attributions.sum(),
        text,
        delta,
    )


def visualize_result(result, all_labels):
    if not result["explanations"]:
        print(f"No Clarify explanations for the record(s)")
        return
    all_explanations = result["explanations"]["kernel_shap"]
    all_predictions = list(csv.reader(result["predictions"]["data"].splitlines()))

    labels = []
    predictions = []
    explanations = []

    for i, expl in enumerate(all_explanations):
        if expl:
            labels.append(all_labels[i])
            predictions.append(all_predictions[i])
            explanations.append(all_explanations[i])

    attributions_dataset = [
        np.array([attr["attribution"][0] for attr in expl[0]["attributions"]])
        for expl in explanations
    ]
    tokens_dataset = [
        np.array([attr["description"]["partial_text"] for attr in expl[0]["attributions"]])
        for expl in explanations
    ]

    # You can customize the following display settings
    normalize = True
    max_frac_to_show = 1
    match_to_pred = False
    vis = []
    for attr, token, pred, label in zip(attributions_dataset, tokens_dataset, predictions, labels):
        vis.append(
            visualization_record(
                attr, token, float(pred[0]), 0.0, label, normalize, max_frac_to_show, match_to_pred
            )
        )
    _ = visualization.visualize_text(vis)

Prepare data

Download data

Data Source: https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/

The Women’s E-Commerce Clothing Reviews dataset has been made available under a Creative Commons Public Domain license. A copy of the dataset has been saved in a sample data Amazon S3 bucket. In the first section of the notebook, we’ll walk through how to download the data and get started with building the ML workflow as a SageMaker pipeline

[ ]:

! curl https://sagemaker-sample-files.s3.amazonaws.com/datasets/tabular/womens_clothing_ecommerce/Womens_Clothing_E-Commerce_Reviews.csv > womens_clothing_reviews_dataset.csv

Load the dataset

[ ]:

df = pd.read_csv("womens_clothing_reviews_dataset.csv", index_col=[0])
df.head()

Context

The Women’s Clothing E-Commerce dataset contains reviews written by customers. Because the dataset contains real commercial data, it has been anonymized, and any references to the company in the review text and body have been replaced with “retailer”.

Content

The dataset contains 23486 rows and 10 columns. Each row corresponds to a customer review.

The columns include:

Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
Age: Positive Integer variable of the reviewer’s age.
Title: String variable for the title of the review.
Review Text: String variable for the review body.
Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
Division Name: Categorical name of the product high level division.
Department Name: Categorical name of the product department name.
Class Name: Categorical name of the product class name.

Goal

To predict the sentiment of a review based on the text, and then explain the predictions using SageMaker Clarify.

Data preparation for model training

Target Variable Creation

Since the dataset does not contain a column that indicates the sentiment of the customer reviews, lets create one. To do this, let’s assume that reviews with a Rating of 4 or higher indicate positive sentiment and reviews with a Rating of 2 or lower indicate negative sentiment. Let’s also assume that a Rating of 3 indicates neutral sentiment and exclude these rows from the dataset. Additionally, to predict the sentiment of a review, we are going to use the Review Text column; therefore let’s remove rows that are empty in the Review Text column of the dataset

[ ]:

def create_target_column(df, min_positive_score, max_negative_score):
    neutral_values = [i for i in range(max_negative_score + 1, min_positive_score)]
    for neutral_value in neutral_values:
        df = df[df["Rating"] != neutral_value]
    df["Sentiment"] = df["Rating"] >= min_positive_score
    return df.replace({"Sentiment": {True: 1, False: 0}})


df = create_target_column(df, 4, 2)
df = df[~df["Review Text"].isna()]

Train-Validation-Test splits

The most common approach for model evaluation is using the train/validation/test split. Although this approach can be very effective in general, it can result in misleading results and potentially fail when used on classification problems with a severe class imbalance. Instead, the technique must be modified to stratify the sampling by the class label as below. Stratification ensures that all classes are well represented across the train, validation and test datasets.

[ ]:

target = "Sentiment"
cols = "Review Text"

X = df[cols]
y = df[target]

# Data split: 11%(val) of the 90% (train and test) of the dataset ~ 10%; resulting in 80:10:10split
test_dataset_size = 0.10
val_dataset_size = 0.11
RANDOM_STATE = 42

# Stratified train-val-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=test_dataset_size, stratify=y, random_state=RANDOM_STATE
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=val_dataset_size, stratify=y_train, random_state=RANDOM_STATE
)

print(
    "Dataset: train ",
    X_train.shape,
    y_train.shape,
    y_train.value_counts(dropna=False, normalize=True).to_dict(),
)
print(
    "Dataset: validation ",
    X_val.shape,
    y_val.shape,
    y_val.value_counts(dropna=False, normalize=True).to_dict(),
)
print(
    "Dataset: test ",
    X_test.shape,
    y_test.shape,
    y_test.value_counts(dropna=False, normalize=True).to_dict(),
)

# Combine the independent columns with the label
df_train = pd.concat([X_train, y_train], axis=1).reset_index(drop=True)
df_test = pd.concat([X_test, y_test], axis=1).reset_index(drop=True)
df_val = pd.concat([X_val, y_val], axis=1).reset_index(drop=True)

[ ]:

headers = df_test.columns.to_list()
feature_headers = headers[0]
label_header = headers[1]
print(f"Feature names: {feature_headers}")
print(f"Label name: {label_header}")
print(f"Test data (without label column):")
test_data = df_test.iloc[:, :1]
test_data

We have split the dataset into train, test, and validation datasets. We use the train and validation datasets during training process, and run Clarify on the test dataset.

In the cell below, we convert the Pandas DataFrames into Hugging Face Datasets for downstream modeling

[ ]:

train_dataset = Dataset.from_pandas(df_train)
val_dataset = Dataset.from_pandas(df_val)

Upload the dataset

Here, we upload the prepared datasets to S3 buckets so that we can train the model with the Hugging Face Estimator.

[ ]:

# S3 key prefix for the datasets
s3 = S3FileSystem()

# save train_dataset to s3
training_input_path = f"{s3_key}/train"
print(f"training input path: {training_input_path}")
train_dataset.save_to_disk(training_input_path, fs=s3)

# save val_dataset to s3
val_input_path = f"{s3_key}/test"
print(f"validation input path: {val_input_path}")
val_dataset.save_to_disk(val_input_path, fs=s3)

Train and Deploy Hugging Face Model

In this step of the workflow, we use the Hugging Face Estimator to load the pre-trained distilbert-base-uncased model and fine-tune the model on our dataset.

Train model with Hugging Face estimator

The hyperparameters defined below are parameters that are passed to the custom PyTorch code in `scripts/train.py <./scripts/train.py>`__. The only required parameter is model_name. The other parameters like epoch, train_batch_size all have default values which can be overridden by setting their values here.

The training job requires GPU instance type. Here, we use ml.g4dn.xlarge.

[ ]:

# Hyperparameters passed into the training job
hyperparameters = {"epochs": 1, "model_name": "distilbert-base-uncased"}

huggingface_estimator = HuggingFace(
    entry_point="train.py",
    source_dir="scripts",
    instance_type="ml.g4dn.xlarge",
    instance_count=1,
    transformers_version="4.6.1",
    pytorch_version="1.7.1",
    py_version="py36",
    role=role,
    hyperparameters=hyperparameters,
    disable_profiler=True,
    debugger_hook_config=False,
)

# starting the train job with our uploaded datasets as input
huggingface_estimator.fit({"train": training_input_path, "test": val_input_path}, logs=True)

Download the trained model files

[ ]:

! aws s3 cp {huggingface_estimator.model_data} model.tar.gz
! mkdir -p {model_path}
! tar -xvf model.tar.gz -C  {model_path}/

Prepare model container definition

We are going to use the trained model files along with the HuggingFace Inference container to deploy the model to a SageMaker endpoint.

[ ]:

with tarfile.open("hf_model.tar.gz", mode="w:gz") as archive:
    archive.add(model_path, recursive=True)
    archive.add("code/")
directory_name = s3_prefix.split("/")[-1]
zipped_model_path = sagemaker_session.upload_data(
    path="hf_model.tar.gz", key_prefix=directory_name + "/hf-model-sm"
)
zipped_model_path

Create a new model object and then update its model artifact and inference script. The model object will be used to create the SageMaker model.

[ ]:

model = huggingface_estimator.create_model(name=model_name)
container_def = model.prepare_container_def(instance_type=instance_type)
container_def["ModelDataUrl"] = zipped_model_path
container_def["Environment"]["SAGEMAKER_PROGRAM"] = "inference.py"
pprint.pprint(container_def)

Create endpoint

Create model

The following parameters are required to create a SageMaker model:

ExecutionRoleArn: The ARN of the IAM role that Amazon SageMaker can assume to access the model artifacts/ docker images for deployment
ModelName: name of the SageMaker model.
PrimaryContainer: The location of the primary docker image containing inference code, associated artifacts, and custom environment map that the inference code uses when the model is deployed for predictions.

[ ]:

sagemaker_client.create_model(
    ExecutionRoleArn=role,
    ModelName=model_name,
    PrimaryContainer=container_def,
)
print(f"Model created: {model_name}")

Create endpoint config

Create an endpoint configuration by calling the create_endpoint_config API. Here, supply the same model_name used in the create_model API call. The create_endpoint_config now supports the additional parameter ClarifyExplainerConfig to enable the Clarify explainer. The SHAP baseline is mandatory, it can be provided either as inline baseline data (the ShapBaseline parameter) or by a S3 baseline file (the ShapBaselineUri parameter). Please see the developer guide for the optional parameters.

Here we use a special token as the baseline.

[ ]:

baseline = [["<UNK>"]]
print(f"SHAP baseline: {baseline}")

The TextConfig configured with sentence level granularity (When granularity is sentence, each sentence is a feature, and we need a few sentences per review for good visualization) and the language as English.

[ ]:

sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "TestVariant",
            "ModelName": model_name,
            "InitialInstanceCount": 1,
            "InstanceType": instance_type,
        }
    ],
    ExplainerConfig={
        "ClarifyExplainerConfig": {
            "InferenceConfig": {"FeatureTypes": ["text"]},
            "ShapConfig": {
                "ShapBaselineConfig": {"ShapBaseline": csv_serializer.serialize(baseline)},
                "TextConfig": {"Granularity": "sentence", "Language": "en"},
            },
        }
    },
)

Create endpoint

Once you have your model and endpoint configuration ready, use the create_endpoint API to create your endpoint. The endpoint_name must be unique within an AWS Region in your AWS account. The create_endpoint API is synchronous in nature and returns an immediate response with the endpoint status being Creating state.

[ ]:

sagemaker_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)

Wait for the endpoint to be in “InService” state

[ ]:

sagemaker_session.wait_for_endpoint(endpoint_name)

Invoke endpoint

There are expanding business needs and legislative regulations that require explanations of why a model made the decision it did. SageMaker Clarify uses SHAP to explain the contribution that each input feature makes to the final decision.

Kernel SHAP algorithm requires a baseline (also known as background dataset). By definition, baseline should either be a S3 URI to the baseline dataset file, or an in-place list of records. Baseline dataset type shall be the same as the original request data type, and baseline records shall only include features.

Below are the several different combination of endpoint invocation, call them one by one and visualize the explanations by running the subsequent cell.

Single record request

Put only one record in the request body, and then send the request to the endpoint to get its predictions and explanations.

[ ]:

num_records = 1

[ ]:

response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    Accept="text/csv",
    Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
)
pprint.pprint(response)

[ ]:

result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)

[ ]:

visualize_result(result, df_test[label_header][:num_records])

Single record request, no explanation

Use the EnableExplanations parameter to disable the explanations for this request.

[ ]:

num_records = 1

[ ]:

response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    Accept="text/csv",
    Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
    EnableExplanations="`false`",  # Do not provide explanations
)
pprint.pprint(response)

[ ]:

result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)

[ ]:

visualize_result(result, df_test[label_header][:num_records])

Batch request, explain both

Put two records in the request body, and then send the request to the endpoint to get their predictions and explanations.

[ ]:

num_records = 2

[ ]:

response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    Accept="text/csv",
    Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
)
pprint.pprint(response)

[ ]:

result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)

[ ]:

visualize_result(result, df_test[label_header][:num_records])

Batch request with more records, explain some of the records

Put a few more records to the request body, and then use the EnableExplanations expression to filter the records to be explained according to their predictions.

[ ]:

num_records = 4

[ ]:

response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    Accept="text/csv",
    Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
    EnableExplanations="[0]>`0.99`",  # Explain a record only when its prediction meets the condition
)
pprint.pprint(response)

[ ]:

result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)

[ ]:

visualize_result(result, df_test[label_header][:num_records])

Cleanup

Finally, don’t forget to clean up the resources we set up and used for this demo!

[ ]:

sagemaker_client.delete_endpoint(EndpointName=endpoint_name)

[ ]:

sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

[ ]:

sagemaker_client.delete_model(ModelName=model_name)