Uplift Tree Visualization¶

Introduction¶

This example notebooks illustrates how to visualize uplift trees for interpretation and diagnosis.

Supported Models¶

These visualization functions work only for tree-based algorithms:

Uplift tree/random forests on KL divergence, Euclidean Distance, and Chi-Square
Uplift tree/random forests on Contextual Treatment Selection

Currently, they are NOT supporting Meta-learner algorithms

S-learner
T-learner
X-learner
R-learner

Supported Usage¶

This notebook will show how to use visualization for:

Uplift Tree and Uplift Random Forest
- Visualize a trained uplift classification tree model
- Visualize an uplift tree in a trained uplift random forests
Training and Validation Data
- Visualize the validation tree: fill the trained uplift classification tree with validation (or testing) data, and show the statistics for both training data and validation data
One Treatment Group and Multiple Treatment Groups
- Visualize the case where there are one control group and one treatment group
- Visualize the case where there are one control group and multiple treatment groups

Step 1 Load Modules¶

Load CausalML modules¶

In [1]:

from causalml.dataset import make_uplift_classification
from causalml.inference.tree import UpliftTreeClassifier, UpliftRandomForestClassifier
from causalml.inference.tree import uplift_tree_string, uplift_tree_plot

Load standard modules¶

In [2]:

import numpy as np
import pandas as pd
from IPython.display import Image
from sklearn.model_selection import train_test_split

One Control + One Treatment for Uplift Classification Tree¶

In [3]:

# Data generation
df, x_names = make_uplift_classification()

df.head()

df = df[df['treatment_group_key'].isin(['control','treatment1'])]

# Look at the conversion rate and sample size in each group
df.pivot_table(values='conversion',
               index='treatment_group_key',
               aggfunc=[np.mean, np.size],
               margins=True)

Out[3]:

	mean	size
	conversion	conversion
treatment_group_key
control	0.5110	1000
treatment1	0.5140	1000
All	0.5125	2000

In [4]:

# Split data to training and testing samples for model validation (next section)
df_train, df_test = train_test_split(df, test_size=0.2, random_state=111)

# Train uplift tree
uplift_model = UpliftTreeClassifier(max_depth = 5, min_samples_leaf = 200, min_samples_treatment = 50, n_reg = 100, evaluationFunction='KL', control_name='control')

uplift_model.fit(df_train[x_names].values,
                 treatment=df_train['treatment_group_key'].values,
                 y=df_train['conversion'].values)

Out[4]:

<causalml.inference.tree.models.UpliftTreeClassifier at 0x7f14217cf2e8>

In [5]:

# Print uplift tree as a string
result = uplift_tree_string(uplift_model.fitted_uplift_tree, x_names)

x18_uplift_increase >= -1.3039248112755226?
yes -> x1_informative >= -0.8277030695336565?
		yes -> x7_irrelevant >= -0.5541554186274725?
				yes -> x19_increase_mix >= -0.782381242022013?
						yes -> {'treatment1': 0.526316, 'control': 0.541667}
						no  -> {'treatment1': 0.618644, 'control': 0.43609}
				no  -> {'treatment1': 0.503067, 'control': 0.5}
		no  -> {'treatment1': 0.438017, 'control': 0.548148}
no  -> {'treatment1': 0.413174, 'control': 0.529412}

Read the tree¶

First line: node split condition
impurity: the value for the loss function
total_sample: total sample size in this node
group_sample: sample size by treatment group
uplift score: the treatment effect between treatment and control (when there are multiple treatment groups, this is the maximum of the treatment effects)
uplift p_value: the p_value for the treatment effect
validation uplift score: when validation data is filled in the tree, this reflects the uplift score based on the - validation data. It can be compared with the uplift score (for training data) to check if there are over-fitting issue.

In [6]:

# Plot uplift tree
graph = uplift_tree_plot(uplift_model.fitted_uplift_tree,x_names)
Image(graph.create_png())

Out[6]:

Visualize Validation Tree: One Control + One Treatment for Uplift Classification Tree¶

Note the validation uplift score will update.

In [7]:

### Fill the trained tree with testing data set 
# The uplift score based on testing dataset is shown as validation uplift score in the tree nodes
uplift_model.fill(X=df_test[x_names].values, treatment=df_test['treatment_group_key'].values, y=df_test['conversion'].values)

# Plot uplift tree
graph = uplift_tree_plot(uplift_model.fitted_uplift_tree,x_names)
Image(graph.create_png())

Out[7]:

Visualize a Tree in Random Forest¶

In [8]:

# Split data to training and testing samples for model validation (next section)
df_train, df_test = train_test_split(df, test_size=0.2, random_state=111)

# Train uplift tree
uplift_model = UpliftRandomForestClassifier(n_estimators=5, max_depth = 5, min_samples_leaf = 200, min_samples_treatment = 50, n_reg = 100, evaluationFunction='KL', control_name='control')

uplift_model.fit(df_train[x_names].values,
                 treatment=df_train['treatment_group_key'].values,
                 y=df_train['conversion'].values)

In [9]:

# Specify a tree in the random forest (the index can be any integer from 0 to n_estimators-1)
uplift_tree = uplift_model.uplift_forest[0]
# Print uplift tree as a string
result = uplift_tree_string(uplift_tree.fitted_uplift_tree, x_names)

x16_increase_mix >= 1.0842809969006826?
yes -> x7_irrelevant >= 0.06177540756557202?
		yes -> {'control': 0.357143, 'treatment1': 0.596899}
		no  -> {'treatment1': 0.546875, 'control': 0.387387}
no  -> x12_uplift_increase >= -0.02034088060341288?
		yes -> {'control': 0.439252, 'treatment1': 0.697479}
		no  -> x18_uplift_increase >= -0.478451770923653?
				yes -> {'control': 0.440559, 'treatment1': 0.507937}
				no  -> x16_increase_mix >= 0.5291311366785828?
						yes -> {'control': 0.626582, 'treatment1': 0.333333}
						no  -> {'treatment1': 0.437037, 'control': 0.494318}

In [10]:

# Plot uplift tree
graph = uplift_tree_plot(uplift_tree.fitted_uplift_tree,x_names)
Image(graph.create_png())

Out[10]:

Fill the tree with validation data¶

In [11]:

### Fill the trained tree with testing data set 
# The uplift score based on testing dataset is shown as validation uplift score in the tree nodes
uplift_tree.fill(X=df_test[x_names].values, treatment=df_test['treatment_group_key'].values, y=df_test['conversion'].values)

# Plot uplift tree
graph = uplift_tree_plot(uplift_tree.fitted_uplift_tree,x_names)
Image(graph.create_png())

Out[11]:

One Control + Multiple Treatments¶

In [12]:

# Data generation
df, x_names = make_uplift_classification()
# Look at the conversion rate and sample size in each group
df.pivot_table(values='conversion',
               index='treatment_group_key',
               aggfunc=[np.mean, np.size],
               margins=True)

Out[12]:

	mean	size
	conversion	conversion
treatment_group_key
control	0.511	1000
treatment1	0.514	1000
treatment2	0.559	1000
treatment3	0.600	1000
All	0.546	4000

In [13]:

# Split data to training and testing samples for model validation (next section)
df_train, df_test = train_test_split(df, test_size=0.2, random_state=111)

# Train uplift tree
uplift_model = UpliftTreeClassifier(max_depth = 3, min_samples_leaf = 200, min_samples_treatment = 50, n_reg = 100, evaluationFunction='KL', control_name='control')

uplift_model.fit(df_train[x_names].values,
                 treatment=df_train['treatment_group_key'].values,
                 y=df_train['conversion'].values)

Out[13]:

<causalml.inference.tree.models.UpliftTreeClassifier at 0x7f1421555978>

In [14]:

# Plot uplift tree
# The uplift score represents the best uplift score among all treatment effects
graph = uplift_tree_plot(uplift_model.fitted_uplift_tree,x_names)
Image(graph.create_png())

Out[14]:

Save the Plot¶

In [15]:

# Save the graph as pdf
graph.write_pdf("tbc.pdf")
# Save the graph as png
graph.write_png("tbc.png")

Out[15]:

True

In [ ]: