NLP Online Explainability with SageMaker Clarify
Introduction
Amazon SageMaker Clarify helps improve your machine learning models by detecting potential bias and helping explain how these models make predictions. The fairness and explainability functionality provided by SageMaker Clarify takes a step towards enabling AWS customers to build trustworthy and understandable machine learning models.
SageMaker Clarify currently supports explainability for SageMaker models as an offline processing job. This example notebook showcases a new feature for explainability on a SageMaker real-time inference endpoint, a.k.a. online explainability.
EnableExplanations query. 1. Explaining the importance of the various input features on
the model’s decision.General Setup
We recommend you use Python 3 (Data Science) kernel on SageMaker Studio or conda_python3 kernel on SageMaker Notebook Instance.
Install dependencies
The following packages are required by data preparation and training.
[ ]:
!pip install "datasets[s3]==1.6.2" "transformers==4.6.1" --upgrade
Upgrade the SageMaker Python SDK, and captum is used to visualize the feature attributions.
[ ]:
!pip install sagemaker --upgrade
!pip install boto3 --upgrade
!pip install botocore --upgrade
[ ]:
!pip install captum --upgrade
Import libraries
[ ]:
import boto3
import csv
import pandas as pd
import numpy as np
import pprint
import tarfile
from sagemaker.huggingface import HuggingFace
from datasets import Dataset
from datasets.filesystems import S3FileSystem
from captum.attr import visualization
from sklearn.model_selection import train_test_split
from sagemaker import get_execution_role, Session
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer
from sagemaker.utils import unique_name_from_base
Set configurations
[ ]:
boto3_session = boto3.session.Session()
sagemaker_client = boto3.client("sagemaker")
sagemaker_runtime_client = boto3.client("sagemaker-runtime")
# Initialize sagemaker session
sagemaker_session = Session(
boto_session=boto3_session,
sagemaker_client=sagemaker_client,
sagemaker_runtime_client=sagemaker_runtime_client,
)
region = sagemaker_session.boto_region_name
print(f"Region: {region}")
role = get_execution_role()
print(f"Role: {role}")
prefix = unique_name_from_base("DEMO-NLP-Women-Clothing")
s3_bucket = sagemaker_session.default_bucket()
s3_prefix = f"sagemaker/{prefix}"
s3_key = f"s3://{s3_bucket}/{s3_prefix}"
print(f"Demo S3 key: {s3_key}")
model_name = f"{prefix}-model"
print(f"Demo model name: {model_name}")
endpoint_config_name = f"{prefix}-endpoint-config"
print(f"Demo endpoint config name: {endpoint_config_name}")
endpoint_name = f"{prefix}-endpoint"
print(f"Demo endpoint name: {endpoint_name}")
# SageMaker Clarify model directory name
model_path = "model/"
# Instance type for training and hosting
instance_type = "ml.m5.xlarge"
Create serializer and deserializer
CSV serializer to serialize test data to string
[ ]:
csv_serializer = CSVSerializer()
JSON deserializer to deserialize invoke endpoint response
[ ]:
json_deserializer = JSONDeserializer()
For visualization
[ ]:
# This method is a wrapper around the captum that helps produce visualizations for local explanations. It will
# visualize the attributions for the tokens with red or green colors for negative and positive attributions.
def visualization_record(
attributions, # list of attributions for the tokens
text, # list of tokens
pred, # the prediction value obtained from the endpoint
delta,
true_label, # the true label from the dataset
normalize=True, # normalizes the attributions so that the max absolute value is 1. Yields stronger colors.
max_frac_to_show=0.05, # what fraction of tokens to highlight, set to 1 for all.
match_to_pred=False, # whether to limit highlights to red for negative predictions and green for positive ones.
# By enabling `match_to_pred` you show what tokens contribute to a high/low prediction not those that oppose it.
):
if normalize:
attributions = attributions / max(max(attributions), max(-attributions))
if max_frac_to_show is not None and max_frac_to_show < 1:
num_show = int(max_frac_to_show * attributions.shape[0])
sal = attributions
if pred < 0.5:
sal = -sal
if not match_to_pred:
sal = np.abs(sal)
top_idxs = np.argsort(-sal)[:num_show]
mask = np.zeros_like(attributions)
mask[top_idxs] = 1
attributions = attributions * mask
return visualization.VisualizationDataRecord(
attributions,
pred,
int(pred > 0.5),
true_label,
attributions.sum() > 0,
attributions.sum(),
text,
delta,
)
def visualize_result(result, all_labels):
if not result["explanations"]:
print(f"No Clarify explanations for the record(s)")
return
all_explanations = result["explanations"]["kernel_shap"]
all_predictions = list(csv.reader(result["predictions"]["data"].splitlines()))
labels = []
predictions = []
explanations = []
for i, expl in enumerate(all_explanations):
if expl:
labels.append(all_labels[i])
predictions.append(all_predictions[i])
explanations.append(all_explanations[i])
attributions_dataset = [
np.array([attr["attribution"][0] for attr in expl[0]["attributions"]])
for expl in explanations
]
tokens_dataset = [
np.array([attr["description"]["partial_text"] for attr in expl[0]["attributions"]])
for expl in explanations
]
# You can customize the following display settings
normalize = True
max_frac_to_show = 1
match_to_pred = False
vis = []
for attr, token, pred, label in zip(attributions_dataset, tokens_dataset, predictions, labels):
vis.append(
visualization_record(
attr, token, float(pred[0]), 0.0, label, normalize, max_frac_to_show, match_to_pred
)
)
_ = visualization.visualize_text(vis)
Prepare data
Download data
Data Source: https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/
The Women’s E-Commerce Clothing Reviews dataset has been made available under a Creative Commons Public Domain license. A copy of the dataset has been saved in a sample data Amazon S3 bucket. In the first section of the notebook, we’ll walk through how to download the data and get started with building the ML workflow as a SageMaker pipeline
[ ]:
! curl https://sagemaker-sample-files.s3.amazonaws.com/datasets/tabular/womens_clothing_ecommerce/Womens_Clothing_E-Commerce_Reviews.csv > womens_clothing_reviews_dataset.csv
Load the dataset
[ ]:
df = pd.read_csv("womens_clothing_reviews_dataset.csv", index_col=[0])
df.head()
Context
The Women’s Clothing E-Commerce dataset contains reviews written by customers. Because the dataset contains real commercial data, it has been anonymized, and any references to the company in the review text and body have been replaced with “retailer”.
Content
The dataset contains 23486 rows and 10 columns. Each row corresponds to a customer review.
The columns include:
Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
Age: Positive Integer variable of the reviewer’s age.
Title: String variable for the title of the review.
Review Text: String variable for the review body.
Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
Division Name: Categorical name of the product high level division.
Department Name: Categorical name of the product department name.
Class Name: Categorical name of the product class name.
Goal
To predict the sentiment of a review based on the text, and then explain the predictions using SageMaker Clarify.
Data preparation for model training
Target Variable Creation
Since the dataset does not contain a column that indicates the sentiment of the customer reviews, lets create one. To do this, let’s assume that reviews with a Rating of 4 or higher indicate positive sentiment and reviews with a Rating of 2 or lower indicate negative sentiment. Let’s also assume that a Rating of 3 indicates neutral sentiment and exclude these rows from the dataset. Additionally, to predict the sentiment of a review, we are going to use the Review Text column;
therefore let’s remove rows that are empty in the Review Text column of the dataset
[ ]:
def create_target_column(df, min_positive_score, max_negative_score):
neutral_values = [i for i in range(max_negative_score + 1, min_positive_score)]
for neutral_value in neutral_values:
df = df[df["Rating"] != neutral_value]
df["Sentiment"] = df["Rating"] >= min_positive_score
return df.replace({"Sentiment": {True: 1, False: 0}})
df = create_target_column(df, 4, 2)
df = df[~df["Review Text"].isna()]
Train-Validation-Test splits
The most common approach for model evaluation is using the train/validation/test split. Although this approach can be very effective in general, it can result in misleading results and potentially fail when used on classification problems with a severe class imbalance. Instead, the technique must be modified to stratify the sampling by the class label as below. Stratification ensures that all classes are well represented across the train, validation and test datasets.
[ ]:
target = "Sentiment"
cols = "Review Text"
X = df[cols]
y = df[target]
# Data split: 11%(val) of the 90% (train and test) of the dataset ~ 10%; resulting in 80:10:10split
test_dataset_size = 0.10
val_dataset_size = 0.11
RANDOM_STATE = 42
# Stratified train-val-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=test_dataset_size, stratify=y, random_state=RANDOM_STATE
)
X_train, X_val, y_train, y_val = train_test_split(
X_train, y_train, test_size=val_dataset_size, stratify=y_train, random_state=RANDOM_STATE
)
print(
"Dataset: train ",
X_train.shape,
y_train.shape,
y_train.value_counts(dropna=False, normalize=True).to_dict(),
)
print(
"Dataset: validation ",
X_val.shape,
y_val.shape,
y_val.value_counts(dropna=False, normalize=True).to_dict(),
)
print(
"Dataset: test ",
X_test.shape,
y_test.shape,
y_test.value_counts(dropna=False, normalize=True).to_dict(),
)
# Combine the independent columns with the label
df_train = pd.concat([X_train, y_train], axis=1).reset_index(drop=True)
df_test = pd.concat([X_test, y_test], axis=1).reset_index(drop=True)
df_val = pd.concat([X_val, y_val], axis=1).reset_index(drop=True)
[ ]:
headers = df_test.columns.to_list()
feature_headers = headers[0]
label_header = headers[1]
print(f"Feature names: {feature_headers}")
print(f"Label name: {label_header}")
print(f"Test data (without label column):")
test_data = df_test.iloc[:, :1]
test_data
We have split the dataset into train, test, and validation datasets. We use the train and validation datasets during training process, and run Clarify on the test dataset.
In the cell below, we convert the Pandas DataFrames into Hugging Face Datasets for downstream modeling
[ ]:
train_dataset = Dataset.from_pandas(df_train)
val_dataset = Dataset.from_pandas(df_val)
Upload the dataset
Here, we upload the prepared datasets to S3 buckets so that we can train the model with the Hugging Face Estimator.
[ ]:
# S3 key prefix for the datasets
s3 = S3FileSystem()
# save train_dataset to s3
training_input_path = f"{s3_key}/train"
print(f"training input path: {training_input_path}")
train_dataset.save_to_disk(training_input_path, fs=s3)
# save val_dataset to s3
val_input_path = f"{s3_key}/test"
print(f"validation input path: {val_input_path}")
val_dataset.save_to_disk(val_input_path, fs=s3)
Train and Deploy Hugging Face Model
In this step of the workflow, we use the Hugging Face Estimator to load the pre-trained distilbert-base-uncased model and fine-tune the model on our dataset.
Train model with Hugging Face estimator
The hyperparameters defined below are parameters that are passed to the custom PyTorch code in `scripts/train.py <./scripts/train.py>`__. The only required parameter is model_name. The other parameters like epoch, train_batch_size all have default values which can be overridden by setting their values here.
The training job requires GPU instance type. Here, we use ml.g4dn.xlarge.
[ ]:
# Hyperparameters passed into the training job
hyperparameters = {"epochs": 1, "model_name": "distilbert-base-uncased"}
huggingface_estimator = HuggingFace(
entry_point="train.py",
source_dir="scripts",
instance_type="ml.g4dn.xlarge",
instance_count=1,
transformers_version="4.6.1",
pytorch_version="1.7.1",
py_version="py36",
role=role,
hyperparameters=hyperparameters,
disable_profiler=True,
debugger_hook_config=False,
)
# starting the train job with our uploaded datasets as input
huggingface_estimator.fit({"train": training_input_path, "test": val_input_path}, logs=True)
Download the trained model files
[ ]:
! aws s3 cp {huggingface_estimator.model_data} model.tar.gz
! mkdir -p {model_path}
! tar -xvf model.tar.gz -C {model_path}/
Prepare model container definition
We are going to use the trained model files along with the HuggingFace Inference container to deploy the model to a SageMaker endpoint.
[ ]:
with tarfile.open("hf_model.tar.gz", mode="w:gz") as archive:
archive.add(model_path, recursive=True)
archive.add("code/")
directory_name = s3_prefix.split("/")[-1]
zipped_model_path = sagemaker_session.upload_data(
path="hf_model.tar.gz", key_prefix=directory_name + "/hf-model-sm"
)
zipped_model_path
Create a new model object and then update its model artifact and inference script. The model object will be used to create the SageMaker model.
[ ]:
model = huggingface_estimator.create_model(name=model_name)
container_def = model.prepare_container_def(instance_type=instance_type)
container_def["ModelDataUrl"] = zipped_model_path
container_def["Environment"]["SAGEMAKER_PROGRAM"] = "inference.py"
pprint.pprint(container_def)
Create endpoint
Create model
The following parameters are required to create a SageMaker model:
ExecutionRoleArn: The ARN of the IAM role that Amazon SageMaker can assume to access the model artifacts/ docker images for deploymentModelName: name of the SageMaker model.PrimaryContainer: The location of the primary docker image containing inference code, associated artifacts, and custom environment map that the inference code uses when the model is deployed for predictions.
[ ]:
sagemaker_client.create_model(
ExecutionRoleArn=role,
ModelName=model_name,
PrimaryContainer=container_def,
)
print(f"Model created: {model_name}")
Create endpoint config
Create an endpoint configuration by calling the create_endpoint_config API. Here, supply the same model_name used in the create_model API call. The create_endpoint_config now supports the additional parameter ClarifyExplainerConfig to enable the Clarify explainer. The SHAP baseline is mandatory, it can be provided either as inline baseline data (the ShapBaseline parameter) or by a S3 baseline file (the ShapBaselineUri parameter). Please see the developer guide for the
optional parameters.
Here we use a special token as the baseline.
[ ]:
baseline = [["<UNK>"]]
print(f"SHAP baseline: {baseline}")
The TextConfig configured with sentence level granularity (When granularity is sentence, each sentence is a feature, and we need a few sentences per review for good visualization) and the language as English.
[ ]:
sagemaker_client.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[
{
"VariantName": "TestVariant",
"ModelName": model_name,
"InitialInstanceCount": 1,
"InstanceType": instance_type,
}
],
ExplainerConfig={
"ClarifyExplainerConfig": {
"InferenceConfig": {"FeatureTypes": ["text"]},
"ShapConfig": {
"ShapBaselineConfig": {"ShapBaseline": csv_serializer.serialize(baseline)},
"TextConfig": {"Granularity": "sentence", "Language": "en"},
},
}
},
)
Create endpoint
Once you have your model and endpoint configuration ready, use the create_endpoint API to create your endpoint. The endpoint_name must be unique within an AWS Region in your AWS account. The create_endpoint API is synchronous in nature and returns an immediate response with the endpoint status being Creating state.
[ ]:
sagemaker_client.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name,
)
Wait for the endpoint to be in “InService” state
[ ]:
sagemaker_session.wait_for_endpoint(endpoint_name)
Invoke endpoint
There are expanding business needs and legislative regulations that require explanations of why a model made the decision it did. SageMaker Clarify uses SHAP to explain the contribution that each input feature makes to the final decision.
Kernel SHAP algorithm requires a baseline (also known as background dataset). By definition, baseline should either be a S3 URI to the baseline dataset file, or an in-place list of records. Baseline dataset type shall be the same as the original request data type, and baseline records shall only include features.
Below are the several different combination of endpoint invocation, call them one by one and visualize the explanations by running the subsequent cell.
Single record request
Put only one record in the request body, and then send the request to the endpoint to get its predictions and explanations.
[ ]:
num_records = 1
[ ]:
response = sagemaker_runtime_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="text/csv",
Accept="text/csv",
Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
)
pprint.pprint(response)
[ ]:
result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)
[ ]:
visualize_result(result, df_test[label_header][:num_records])
Single record request, no explanation
Use the EnableExplanations parameter to disable the explanations for this request.
[ ]:
num_records = 1
[ ]:
response = sagemaker_runtime_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="text/csv",
Accept="text/csv",
Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
EnableExplanations="`false`", # Do not provide explanations
)
pprint.pprint(response)
[ ]:
result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)
[ ]:
visualize_result(result, df_test[label_header][:num_records])
Batch request, explain both
Put two records in the request body, and then send the request to the endpoint to get their predictions and explanations.
[ ]:
num_records = 2
[ ]:
response = sagemaker_runtime_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="text/csv",
Accept="text/csv",
Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
)
pprint.pprint(response)
[ ]:
result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)
[ ]:
visualize_result(result, df_test[label_header][:num_records])
Batch request with more records, explain some of the records
Put a few more records to the request body, and then use the EnableExplanations expression to filter the records to be explained according to their predictions.
[ ]:
num_records = 4
[ ]:
response = sagemaker_runtime_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="text/csv",
Accept="text/csv",
Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
EnableExplanations="[0]>`0.99`", # Explain a record only when its prediction meets the condition
)
pprint.pprint(response)
[ ]:
result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)
[ ]:
visualize_result(result, df_test[label_header][:num_records])
Cleanup
Finally, don’t forget to clean up the resources we set up and used for this demo!
[ ]:
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
[ ]:
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
[ ]:
sagemaker_client.delete_model(ModelName=model_name)