This notebook includes two sections:
(Paper reference: Zhao, Zhenyu, et al. "Feature Selection Methods for Uplift Modeling." arXiv preprint arXiv:2005.03447 (2020).)
import numpy as np
import pandas as pd
from causalml.dataset import make_uplift_classification
The sklearn.utils.testing module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.utils. Anything that cannot be imported from sklearn.utils is now part of the private API.
from causalml.feature_selection.filters import FilterSelect
from causalml.inference.tree import UpliftRandomForestClassifier
from causalml.inference.meta import BaseXRegressor, BaseRRegressor, BaseSRegressor, BaseTRegressor
from causalml.metrics import plot_gain, auuc_score
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import logging
logger = logging.getLogger('causalml')
logging.basicConfig(level=logging.INFO)
Generate synthetic data using the built-in function.
# define parameters for simulation
y_name = 'conversion'
treatment_group_keys = ['control', 'treatment1']
n = 100000
n_classification_features = 50
n_classification_informative = 10
n_classification_repeated = 0
n_uplift_increase_dict = {'treatment1': 8}
n_uplift_decrease_dict = {'treatment1': 4}
delta_uplift_increase_dict = {'treatment1': 0.1}
delta_uplift_decrease_dict = {'treatment1': -0.1}
random_seed = 20200808
df, X_names = make_uplift_classification(
treatment_name=treatment_group_keys,
y_name=y_name,
n_samples=n,
n_classification_features=n_classification_features,
n_classification_informative=n_classification_informative,
n_classification_repeated=n_classification_repeated,
n_uplift_increase_dict=n_uplift_increase_dict,
n_uplift_decrease_dict=n_uplift_decrease_dict,
delta_uplift_increase_dict = delta_uplift_increase_dict,
delta_uplift_decrease_dict = delta_uplift_decrease_dict,
random_seed=random_seed
)
df.head()
| treatment_group_key | x1_informative | x2_informative | x3_informative | x4_informative | x5_informative | x6_informative | x7_informative | x8_informative | x9_informative | ... | x56_uplift_increase | x57_uplift_increase | x58_uplift_increase | x59_increase_mix | x60_uplift_decrease | x61_uplift_decrease | x62_uplift_decrease | x63_uplift_decrease | conversion | treatment_effect | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | control | 0.653960 | -0.217603 | 1.856916 | -0.075662 | 0.080971 | -0.338374 | -1.011470 | 0.528000 | 0.115418 | ... | 1.533832 | -2.183001 | 1.839608 | 0.755302 | 1.835047 | -0.458431 | -1.927525 | 2.765331 | 0 | 0 |
| 1 | control | 3.439658 | 0.477855 | -0.377658 | -1.317121 | 0.861815 | -0.393180 | 0.503727 | 2.323846 | 1.229948 | ... | -1.192333 | -1.581815 | 2.423700 | 2.396904 | 0.296043 | -1.961940 | -1.444725 | 1.469213 | 1 | 0 |
| 2 | treatment1 | 0.130907 | -0.333536 | 0.474847 | -0.352067 | -0.024502 | 1.437105 | 0.566178 | -0.232508 | 0.866236 | ... | -0.301982 | -0.933816 | 0.475274 | 1.540994 | 0.698066 | 0.545091 | -0.084405 | -2.337347 | 1 | 0 |
| 3 | treatment1 | -2.156683 | 1.120198 | 0.174293 | -1.741426 | 0.488993 | 0.638340 | -0.721928 | 1.802134 | 1.097178 | ... | -2.129098 | -1.183581 | 0.000318 | 1.105735 | -0.629281 | -0.737041 | -1.525081 | 1.416042 | 0 | 0 |
| 4 | control | -2.708572 | -0.799698 | -2.199595 | 0.574077 | 0.083142 | -0.389140 | 1.492101 | 1.725202 | 1.194315 | ... | 1.582041 | -1.176077 | 1.686322 | 0.480035 | 1.780710 | 0.862094 | 0.128872 | -2.851344 | 0 | 0 |
5 rows × 66 columns
# Look at the conversion rate and sample size in each group
df.pivot_table(values='conversion',
index='treatment_group_key',
aggfunc=[np.mean, np.size],
margins=True)
| mean | size | |
|---|---|---|
| conversion | conversion | |
| treatment_group_key | ||
| control | 0.499050 | 100000 |
| treatment1 | 0.599680 | 100000 |
| All | 0.549365 | 200000 |
X_names
filter_f = FilterSelect()
method = 'F'
f_imp = filter_f.get_importance(df, X_names, y_name, method,
treatment_group = 'treatment1')
print(f_imp)
method feature rank score p_value \
0 F filter x57_uplift_increase 1.0 1973.380496 0.000000e+00
0 F filter x51_uplift_increase 2.0 1885.342364 0.000000e+00
0 F filter x54_uplift_increase 3.0 1496.254091 0.000000e+00
0 F filter x58_uplift_increase 4.0 1269.167710 4.224019e-277
0 F filter x9_informative 5.0 677.066204 5.151887e-149
0 F filter x63_uplift_decrease 6.0 9.108409 2.544691e-03
0 F filter x61_uplift_decrease 7.0 5.978189 1.448472e-02
0 F filter x19_irrelevant 8.0 5.295584 2.138059e-02
0 F filter x46_irrelevant 9.0 5.237353 2.210792e-02
0 F filter x27_irrelevant 10.0 4.573196 3.247713e-02
0 F filter x11_irrelevant 11.0 4.297030 3.818027e-02
0 F filter x39_irrelevant 12.0 4.009421 4.524803e-02
0 F filter x42_irrelevant 13.0 3.788770 5.159896e-02
0 F filter x60_uplift_decrease 14.0 3.089516 7.879975e-02
0 F filter x53_uplift_increase 15.0 2.884902 8.941499e-02
0 F filter x18_irrelevant 16.0 2.863763 9.059688e-02
0 F filter x22_irrelevant 17.0 2.402012 1.211809e-01
0 F filter x62_uplift_decrease 18.0 2.310073 1.285396e-01
0 F filter x40_irrelevant 19.0 2.262581 1.325346e-01
0 F filter x14_irrelevant 20.0 2.152103 1.423763e-01
0 F filter x8_informative 21.0 1.947212 1.628892e-01
0 F filter x33_irrelevant 22.0 1.691045 1.934648e-01
0 F filter x47_irrelevant 23.0 1.622995 2.026761e-01
0 F filter x28_irrelevant 24.0 1.525337 2.168150e-01
0 F filter x7_informative 25.0 1.206987 2.719311e-01
0 F filter x59_increase_mix 26.0 1.199216 2.734798e-01
0 F filter x20_irrelevant 27.0 1.176234 2.781252e-01
0 F filter x41_irrelevant 28.0 1.119234 2.900848e-01
0 F filter x3_informative 29.0 1.011457 3.145553e-01
0 F filter x16_irrelevant 30.0 0.999273 3.174877e-01
.. ... ... ... ... ...
0 F filter x2_informative 34.0 0.775075 3.786527e-01
0 F filter x45_irrelevant 35.0 0.746410 3.876164e-01
0 F filter x31_irrelevant 36.0 0.670080 4.130248e-01
0 F filter x55_uplift_increase 37.0 0.609454 4.349944e-01
0 F filter x34_irrelevant 38.0 0.606343 4.361689e-01
0 F filter x44_irrelevant 39.0 0.563659 4.527906e-01
0 F filter x12_irrelevant 40.0 0.531649 4.659151e-01
0 F filter x4_informative 41.0 0.412528 5.206899e-01
0 F filter x26_irrelevant 42.0 0.348929 5.547207e-01
0 F filter x48_irrelevant 43.0 0.348312 5.550711e-01
0 F filter x25_irrelevant 44.0 0.333696 5.634916e-01
0 F filter x24_irrelevant 45.0 0.330729 5.652307e-01
0 F filter x23_irrelevant 46.0 0.327771 5.669751e-01
0 F filter x52_uplift_increase 47.0 0.316966 5.734374e-01
0 F filter x37_irrelevant 48.0 0.246766 6.193618e-01
0 F filter x15_irrelevant 49.0 0.225643 6.347740e-01
0 F filter x29_irrelevant 50.0 0.196632 6.574534e-01
0 F filter x38_irrelevant 51.0 0.109701 7.404853e-01
0 F filter x35_irrelevant 52.0 0.101365 7.501982e-01
0 F filter x10_informative 53.0 0.094686 7.583024e-01
0 F filter x21_irrelevant 54.0 0.056172 8.126528e-01
0 F filter x43_irrelevant 55.0 0.043168 8.354093e-01
0 F filter x13_irrelevant 56.0 0.013480 9.075699e-01
0 F filter x49_irrelevant 57.0 0.008037 9.285639e-01
0 F filter x17_irrelevant 58.0 0.005137 9.428651e-01
0 F filter x30_irrelevant 59.0 0.004151 9.486301e-01
0 F filter x50_irrelevant 60.0 0.001379 9.703808e-01
0 F filter x36_irrelevant 61.0 0.001062 9.740069e-01
0 F filter x6_informative 62.0 0.000428 9.834997e-01
0 F filter x5_informative 63.0 0.000076 9.930457e-01
misc
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
.. ...
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
0 df_num: 1.0, df_denom: 199996.0
[63 rows x 6 columns]
method = 'LR'
lr_imp = filter_f.get_importance(df, X_names, y_name, method,
treatment_group = 'treatment1')
print(lr_imp)
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683135
Iterations 4
Optimization terminated successfully.
Current function value: 0.570527
Iterations 6
Optimization terminated successfully.
Current function value: 0.568449
Iterations 6
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683127
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683134
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683136
Iterations 4
Optimization terminated successfully.
Current function value: 0.683136
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683136
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683129
Iterations 4
Optimization terminated successfully.
Current function value: 0.683137
Iterations 4
Optimization terminated successfully.
Current function value: 0.683135
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683137
Iterations 4
Optimization terminated successfully.
Current function value: 0.683131
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683136
Iterations 4
Optimization terminated successfully.
Current function value: 0.683125
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683136
Iterations 4
Optimization terminated successfully.
Current function value: 0.683136
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683139
Iterations 4
Optimization terminated successfully.
Current function value: 0.683137
Iterations 4
Optimization terminated successfully.
Current function value: 0.683135
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683139
Iterations 4
Optimization terminated successfully.
Current function value: 0.683137
Iterations 4
Optimization terminated successfully.
Current function value: 0.683135
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683136
Iterations 4
Optimization terminated successfully.
Current function value: 0.683126
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683135
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683133
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683139
Iterations 4
Optimization terminated successfully.
Current function value: 0.683126
Iterations 4
Optimization terminated successfully.
Current function value: 0.683113
Iterations 4
Optimization terminated successfully.
Current function value: 0.683138
Iterations 4
Optimization terminated successfully.
Current function value: 0.683134
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.678404
Iterations 4
Optimization terminated successfully.
Current function value: 0.673481
Iterations 5
Optimization terminated successfully.
Current function value: 0.683139
Iterations 4
Optimization terminated successfully.
Current function value: 0.683139
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683136
Iterations 4
Optimization terminated successfully.
Current function value: 0.678797
Iterations 4
Optimization terminated successfully.
Current function value: 0.674807
Iterations 5
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683141
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683139
Iterations 4
Optimization terminated successfully.
Current function value: 0.678449
Iterations 4
Optimization terminated successfully.
Current function value: 0.673272
Iterations 5
Optimization terminated successfully.
Current function value: 0.679964
Iterations 4
Optimization terminated successfully.
Current function value: 0.676614
Iterations 5
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683140
Iterations 4
Optimization terminated successfully.
Current function value: 0.683142
Iterations 4
Optimization terminated successfully.
Current function value: 0.683134
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683128
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683137
Iterations 4
Optimization terminated successfully.
Current function value: 0.683143
Iterations 4
Optimization terminated successfully.
Current function value: 0.683120
Iterations 4
method feature rank score p_value misc
0 LR filter x57_uplift_increase 1.0 2070.582853 0.000000 df: 1
0 LR filter x51_uplift_increase 2.0 1969.081668 0.000000 df: 1
0 LR filter x54_uplift_increase 3.0 1596.059562 0.000000 df: 1
0 LR filter x58_uplift_increase 4.0 1339.970602 0.000000 df: 1
0 LR filter x9_informative 5.0 830.925812 0.000000 df: 1
0 LR filter x63_uplift_decrease 6.0 9.149363 0.002488 df: 1
0 LR filter x61_uplift_decrease 7.0 6.013194 0.014199 df: 1
0 LR filter x46_irrelevant 8.0 5.484790 0.019183 df: 1
0 LR filter x19_irrelevant 9.0 5.345715 0.020773 df: 1
0 LR filter x27_irrelevant 10.0 4.433104 0.035248 df: 1
0 LR filter x11_irrelevant 11.0 4.422047 0.035477 df: 1
0 LR filter x39_irrelevant 12.0 3.874147 0.049035 df: 1
0 LR filter x42_irrelevant 13.0 3.735743 0.053260 df: 1
0 LR filter x60_uplift_decrease 14.0 3.138529 0.076463 df: 1
0 LR filter x53_uplift_increase 15.0 2.865203 0.090514 df: 1
0 LR filter x18_irrelevant 16.0 2.840988 0.091888 df: 1
0 LR filter x22_irrelevant 17.0 2.504072 0.113552 df: 1
0 LR filter x62_uplift_decrease 18.0 2.317535 0.127923 df: 1
0 LR filter x14_irrelevant 19.0 2.226707 0.135643 df: 1
0 LR filter x40_irrelevant 20.0 2.197060 0.138274 df: 1
0 LR filter x8_informative 21.0 1.884233 0.169854 df: 1
0 LR filter x47_irrelevant 22.0 1.698939 0.192427 df: 1
0 LR filter x33_irrelevant 23.0 1.676522 0.195387 df: 1
0 LR filter x28_irrelevant 24.0 1.493135 0.221731 df: 1
0 LR filter x7_informative 25.0 1.249156 0.263714 df: 1
0 LR filter x59_increase_mix 26.0 1.197540 0.273814 df: 1
0 LR filter x41_irrelevant 27.0 1.139287 0.285803 df: 1
0 LR filter x20_irrelevant 28.0 1.109899 0.292104 df: 1
0 LR filter x1_informative 29.0 1.014855 0.313743 df: 1
0 LR filter x3_informative 30.0 0.992359 0.319166 df: 1
.. ... ... ... ... ... ...
0 LR filter x32_irrelevant 34.0 0.764228 0.382009 df: 1
0 LR filter x2_informative 35.0 0.732987 0.391917 df: 1
0 LR filter x31_irrelevant 36.0 0.705755 0.400857 df: 1
0 LR filter x55_uplift_increase 37.0 0.598018 0.439335 df: 1
0 LR filter x44_irrelevant 38.0 0.590518 0.442219 df: 1
0 LR filter x34_irrelevant 39.0 0.556178 0.455804 df: 1
0 LR filter x12_irrelevant 40.0 0.549555 0.458499 df: 1
0 LR filter x4_informative 41.0 0.392499 0.530989 df: 1
0 LR filter x48_irrelevant 42.0 0.357407 0.549950 df: 1
0 LR filter x26_irrelevant 43.0 0.346474 0.556116 df: 1
0 LR filter x24_irrelevant 44.0 0.321736 0.570566 df: 1
0 LR filter x23_irrelevant 45.0 0.316385 0.573789 df: 1
0 LR filter x25_irrelevant 46.0 0.310967 0.577087 df: 1
0 LR filter x52_uplift_increase 47.0 0.289571 0.590495 df: 1
0 LR filter x37_irrelevant 48.0 0.267554 0.604977 df: 1
0 LR filter x15_irrelevant 49.0 0.215249 0.642684 df: 1
0 LR filter x29_irrelevant 50.0 0.192926 0.660492 df: 1
0 LR filter x35_irrelevant 51.0 0.120501 0.728492 df: 1
0 LR filter x38_irrelevant 52.0 0.116905 0.732416 df: 1
0 LR filter x10_informative 53.0 0.107143 0.743421 df: 1
0 LR filter x21_irrelevant 54.0 0.042848 0.836011 df: 1
0 LR filter x43_irrelevant 55.0 0.036466 0.848557 df: 1
0 LR filter x13_irrelevant 56.0 0.012543 0.910826 df: 1
0 LR filter x17_irrelevant 57.0 0.011252 0.915523 df: 1
0 LR filter x49_irrelevant 58.0 0.005559 0.940565 df: 1
0 LR filter x36_irrelevant 59.0 0.002396 0.960964 df: 1
0 LR filter x50_irrelevant 60.0 0.001808 0.966086 df: 1
0 LR filter x30_irrelevant 61.0 0.000956 0.975339 df: 1
0 LR filter x6_informative 62.0 0.000732 0.978420 df: 1
0 LR filter x5_informative 63.0 0.000029 0.995726 df: 1
[63 rows x 6 columns]
method = 'KL'
kl_imp = filter_f.get_importance(df, X_names, y_name, method,
treatment_group = 'treatment1',
n_bins=10)
print(kl_imp)
method feature rank score p_value misc 0 KL filter x51_uplift_increase 1.0 0.026008 None number_of_bins: 10 0 KL filter x57_uplift_increase 2.0 0.023749 None number_of_bins: 10 0 KL filter x9_informative 3.0 0.020550 None number_of_bins: 10 0 KL filter x54_uplift_increase 4.0 0.018411 None number_of_bins: 10 0 KL filter x58_uplift_increase 5.0 0.014443 None number_of_bins: 10 0 KL filter x52_uplift_increase 6.0 0.002416 None number_of_bins: 10 0 KL filter x55_uplift_increase 7.0 0.000283 None number_of_bins: 10 0 KL filter x23_irrelevant 8.0 0.000221 None number_of_bins: 10 0 KL filter x59_increase_mix 9.0 0.000218 None number_of_bins: 10 0 KL filter x21_irrelevant 10.0 0.000206 None number_of_bins: 10 0 KL filter x15_irrelevant 11.0 0.000157 None number_of_bins: 10 0 KL filter x11_irrelevant 12.0 0.000155 None number_of_bins: 10 0 KL filter x46_irrelevant 13.0 0.000150 None number_of_bins: 10 0 KL filter x39_irrelevant 14.0 0.000146 None number_of_bins: 10 0 KL filter x53_uplift_increase 15.0 0.000142 None number_of_bins: 10 0 KL filter x10_informative 16.0 0.000137 None number_of_bins: 10 0 KL filter x2_informative 17.0 0.000135 None number_of_bins: 10 0 KL filter x31_irrelevant 18.0 0.000132 None number_of_bins: 10 0 KL filter x19_irrelevant 19.0 0.000130 None number_of_bins: 10 0 KL filter x40_irrelevant 20.0 0.000125 None number_of_bins: 10 0 KL filter x44_irrelevant 21.0 0.000124 None number_of_bins: 10 0 KL filter x61_uplift_decrease 22.0 0.000118 None number_of_bins: 10 0 KL filter x60_uplift_decrease 23.0 0.000118 None number_of_bins: 10 0 KL filter x63_uplift_decrease 24.0 0.000112 None number_of_bins: 10 0 KL filter x32_irrelevant 25.0 0.000109 None number_of_bins: 10 0 KL filter x35_irrelevant 26.0 0.000104 None number_of_bins: 10 0 KL filter x14_irrelevant 27.0 0.000102 None number_of_bins: 10 0 KL filter x38_irrelevant 28.0 0.000094 None number_of_bins: 10 0 KL filter x27_irrelevant 29.0 0.000091 None number_of_bins: 10 0 KL filter x33_irrelevant 30.0 0.000090 None number_of_bins: 10 .. ... ... ... ... ... ... 0 KL filter x16_irrelevant 34.0 0.000083 None number_of_bins: 10 0 KL filter x34_irrelevant 35.0 0.000082 None number_of_bins: 10 0 KL filter x18_irrelevant 36.0 0.000076 None number_of_bins: 10 0 KL filter x36_irrelevant 37.0 0.000075 None number_of_bins: 10 0 KL filter x20_irrelevant 38.0 0.000074 None number_of_bins: 10 0 KL filter x4_informative 39.0 0.000073 None number_of_bins: 10 0 KL filter x26_irrelevant 40.0 0.000072 None number_of_bins: 10 0 KL filter x42_irrelevant 41.0 0.000071 None number_of_bins: 10 0 KL filter x8_informative 42.0 0.000071 None number_of_bins: 10 0 KL filter x6_informative 43.0 0.000071 None number_of_bins: 10 0 KL filter x62_uplift_decrease 44.0 0.000065 None number_of_bins: 10 0 KL filter x12_irrelevant 45.0 0.000063 None number_of_bins: 10 0 KL filter x5_informative 46.0 0.000062 None number_of_bins: 10 0 KL filter x1_informative 47.0 0.000060 None number_of_bins: 10 0 KL filter x49_irrelevant 48.0 0.000059 None number_of_bins: 10 0 KL filter x47_irrelevant 49.0 0.000058 None number_of_bins: 10 0 KL filter x48_irrelevant 50.0 0.000057 None number_of_bins: 10 0 KL filter x25_irrelevant 51.0 0.000057 None number_of_bins: 10 0 KL filter x22_irrelevant 52.0 0.000056 None number_of_bins: 10 0 KL filter x41_irrelevant 53.0 0.000049 None number_of_bins: 10 0 KL filter x37_irrelevant 54.0 0.000049 None number_of_bins: 10 0 KL filter x56_uplift_increase 55.0 0.000043 None number_of_bins: 10 0 KL filter x13_irrelevant 56.0 0.000039 None number_of_bins: 10 0 KL filter x50_irrelevant 57.0 0.000038 None number_of_bins: 10 0 KL filter x24_irrelevant 58.0 0.000036 None number_of_bins: 10 0 KL filter x29_irrelevant 59.0 0.000021 None number_of_bins: 10 0 KL filter x30_irrelevant 60.0 0.000020 None number_of_bins: 10 0 KL filter x17_irrelevant 61.0 0.000017 None number_of_bins: 10 0 KL filter x45_irrelevant 62.0 0.000013 None number_of_bins: 10 0 KL filter x7_informative 63.0 0.000011 None number_of_bins: 10 [63 rows x 6 columns]
We found all these 3 filter methods were able to rank most of the informative and uplift increase features on the top.
Evaluate the AUUC (Area Under the Uplift Curve) score with several uplift models when using top features dataset
# train test split
df_train, df_test = train_test_split(df, test_size=0.2, random_state=111)
# convert treatment column to 1 (treatment1) and 0 (control)
treatments = np.where((df_test['treatment_group_key']=='treatment1'), 1, 0)
print(treatments[:10])
print(df_test['treatment_group_key'][:10])
[1 0 0 0 0 1 0 1 1 0] 79114 treatment1 76043 control 47617 control 53169 control 175702 control 111635 treatment1 129212 control 19247 treatment1 49272 treatment1 199314 control Name: treatment_group_key, dtype: object
uplift_model = UpliftRandomForestClassifier(control_name='control', max_depth=8)
# using all features
features = X_names
uplift_model.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds = uplift_model.predict(df_test[features].values)
top_n = 10
top_10_features = kl_imp['feature'][:top_n]
print(top_10_features)
0 x51_uplift_increase 0 x57_uplift_increase 0 x9_informative 0 x54_uplift_increase 0 x58_uplift_increase 0 x52_uplift_increase 0 x55_uplift_increase 0 x23_irrelevant 0 x59_increase_mix 0 x21_irrelevant Name: feature, dtype: object
top_n = 15
top_15_features = kl_imp['feature'][:top_n]
print(top_15_features)
0 x51_uplift_increase 0 x57_uplift_increase 0 x9_informative 0 x54_uplift_increase 0 x58_uplift_increase 0 x52_uplift_increase 0 x55_uplift_increase 0 x23_irrelevant 0 x59_increase_mix 0 x21_irrelevant 0 x15_irrelevant 0 x11_irrelevant 0 x46_irrelevant 0 x39_irrelevant 0 x53_uplift_increase Name: feature, dtype: object
top_n = 20
top_20_features = kl_imp['feature'][:top_n]
print(top_20_features)
0 x51_uplift_increase 0 x57_uplift_increase 0 x9_informative 0 x54_uplift_increase 0 x58_uplift_increase 0 x52_uplift_increase 0 x55_uplift_increase 0 x23_irrelevant 0 x59_increase_mix 0 x21_irrelevant 0 x15_irrelevant 0 x11_irrelevant 0 x46_irrelevant 0 x39_irrelevant 0 x53_uplift_increase 0 x10_informative 0 x2_informative 0 x31_irrelevant 0 x19_irrelevant 0 x40_irrelevant Name: feature, dtype: object
# using top 10 features
features = top_10_features
uplift_model.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds_t10 = uplift_model.predict(df_test[features].values)
# using top 15 features
features = top_15_features
uplift_model.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds_t15 = uplift_model.predict(df_test[features].values)
# using top 20 features
features = top_20_features
uplift_model.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds_t20 = uplift_model.predict(df_test[features].values)
df_preds = pd.DataFrame([y_preds.ravel(),
y_preds_t10.ravel(),
y_preds_t15.ravel(),
y_preds_t20.ravel(),
treatments,
df_test[y_name].ravel()],
index=['All', 'Top 10', 'Top 15', 'Top 20', 'is_treated', y_name]).T
plot_gain(df_preds, outcome_col=y_name, treatment_col='is_treated')
auuc_score(df_preds, outcome_col=y_name, treatment_col='is_treated')
All 0.874182 Top 10 0.874931 Top 15 0.889868 Top 20 0.894400 Random 0.493027 dtype: float64
r_rf_learner = BaseRRegressor(
RandomForestRegressor(
n_estimators = 100,
max_depth = 8,
min_samples_leaf = 100
),
control_name='control')
# using all features
features = X_names
r_rf_learner.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds = r_rf_learner.predict(df_test[features].values)
INFO:causalml:Generating propensity score INFO:causalml:Calibrating propensity scores. INFO:causalml:generating out-of-fold CV outcome estimates INFO:causalml:training the treatment effect model for treatment1 with R-loss
# using top 10 features
features = top_10_features
r_rf_learner.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds_t10 = r_rf_learner.predict(df_test[features].values)
INFO:causalml:Generating propensity score INFO:causalml:Calibrating propensity scores. INFO:causalml:generating out-of-fold CV outcome estimates INFO:causalml:training the treatment effect model for treatment1 with R-loss
# using top 15 features
features = top_15_features
r_rf_learner.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds_t15 = r_rf_learner.predict(df_test[features].values)
INFO:causalml:Generating propensity score INFO:causalml:Calibrating propensity scores. INFO:causalml:generating out-of-fold CV outcome estimates INFO:causalml:training the treatment effect model for treatment1 with R-loss
# using top 20 features
features = top_20_features
r_rf_learner.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds_t20 = r_rf_learner.predict(df_test[features].values)
INFO:causalml:Generating propensity score INFO:causalml:Calibrating propensity scores. INFO:causalml:generating out-of-fold CV outcome estimates INFO:causalml:training the treatment effect model for treatment1 with R-loss
df_preds = pd.DataFrame([y_preds.ravel(),
y_preds_t10.ravel(),
y_preds_t15.ravel(),
y_preds_t20.ravel(),
treatments,
df_test[y_name].ravel()],
index=['All', 'Top 10', 'Top 15', 'Top 20', 'is_treated', y_name]).T
plot_gain(df_preds, outcome_col=y_name, treatment_col='is_treated')
# print out AUUC score
auuc_score(df_preds, outcome_col=y_name, treatment_col='is_treated')
All 0.890747 Top 10 0.906526 Top 15 0.899187 Top 20 0.901242 Random 0.493027 dtype: float64
(a relatively smaller enhancement on the AUUC is observed in this R Learner case)
slearner_rf = BaseSRegressor(
RandomForestRegressor(
n_estimators = 100,
max_depth = 8,
min_samples_leaf = 100
),
control_name='control')
# using all features
features = X_names
slearner_rf.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds = slearner_rf.predict(df_test[features].values)
# using top 10 features
features = top_10_features
slearner_rf.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds_t10 = slearner_rf.predict(df_test[features].values)
# using top 15 features
features = top_15_features
slearner_rf.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds_t15 = slearner_rf.predict(df_test[features].values)
# using top 20 features
features = top_20_features
slearner_rf.fit(X = df_train[features].values,
treatment = df_train['treatment_group_key'].values,
y = df_train[y_name].values)
y_preds_t20 = slearner_rf.predict(df_test[features].values)
df_preds = pd.DataFrame([y_preds.ravel(),
y_preds_t10.ravel(),
y_preds_t15.ravel(),
y_preds_t20.ravel(),
treatments,
df_test[y_name].ravel()],
index=['All', 'Top 10', 'Top 15', 'Top 20', 'is_treated', y_name]).T
plot_gain(df_preds, outcome_col=y_name, treatment_col='is_treated')
# print out AUUC score
auuc_score(df_preds, outcome_col=y_name, treatment_col='is_treated')
All 0.864579 Top 10 0.885354 Top 15 0.878970 Top 20 0.877791 Random 0.493027 dtype: float64
In this notebook, we demonstrated how our Filter method functions are able to select important features and enhance the AUUC performance (while the results might vary among different datasets, models and hyper-parameters).