Here we study the twins dataset as studied by Louizos et al. We focus on twins which are the same sex and weigh less than 2kgs. The treatment t = 1 is being born the heavier twin and the outcome is mortality of each of the twins in their first year of life.The confounding variable taken is 'gestat10', the number of gestational weeks prior to birth, as it is highly correlated with the outcome. The results using the methods below are in coherence with those obtained in the paper.
import pandas as pd
import numpy as np
import dowhy
from dowhy import CausalModel
from dowhy import causal_estimators
# Config dict to set the logging level
import logging.config
DEFAULT_LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'loggers': {
'': {
'level': 'WARN',
},
}
}
logging.config.dictConfig(DEFAULT_LOGGING)
# Disabling warnings output
import warnings
from sklearn.exceptions import DataConversionWarning
warnings.filterwarnings(action='ignore', category=DataConversionWarning)
Load the Data
The data loading process involves combining the covariates, treatment and outcome, and resolving the pair property in the data. Since there are entries for both the twins, their mortalities can be treated as two potential outcomes. The treatment is given in terms of weights of the twins.Therefore, to get a binary treatment, each child's information is added in a separate row instead of both's information being condensed in a single row as in the original data source.
#The covariates data has 46 features
x = pd.read_csv("https://raw.githubusercontent.com/AMLab-Amsterdam/CEVAE/master/datasets/TWINS/twin_pairs_X_3years_samesex.csv")
#The outcome data contains mortality of the lighter and heavier twin
y = pd.read_csv("https://raw.githubusercontent.com/AMLab-Amsterdam/CEVAE/master/datasets/TWINS/twin_pairs_Y_3years_samesex.csv")
#The treatment data contains weight in grams of both the twins
t = pd.read_csv("https://raw.githubusercontent.com/AMLab-Amsterdam/CEVAE/master/datasets/TWINS/twin_pairs_T_3years_samesex.csv")
#_0 denotes features specific to the lighter twin and _1 denotes features specific to the heavier twin
lighter_columns = ['pldel', 'birattnd', 'brstate', 'stoccfipb', 'mager8',
'ormoth', 'mrace', 'meduc6', 'dmar', 'mplbir', 'mpre5', 'adequacy',
'orfath', 'frace', 'birmon', 'gestat10', 'csex', 'anemia', 'cardiac',
'lung', 'diabetes', 'herpes', 'hydra', 'hemo', 'chyper', 'phyper',
'eclamp', 'incervix', 'pre4000', 'preterm', 'renal', 'rh', 'uterine',
'othermr', 'tobacco', 'alcohol', 'cigar6', 'drink5', 'crace',
'data_year', 'nprevistq', 'dfageq', 'feduc6', 'infant_id_0',
'dlivord_min', 'dtotord_min', 'bord_0',
'brstate_reg', 'stoccfipb_reg', 'mplbir_reg']
heavier_columns = [ 'pldel', 'birattnd', 'brstate', 'stoccfipb', 'mager8',
'ormoth', 'mrace', 'meduc6', 'dmar', 'mplbir', 'mpre5', 'adequacy',
'orfath', 'frace', 'birmon', 'gestat10', 'csex', 'anemia', 'cardiac',
'lung', 'diabetes', 'herpes', 'hydra', 'hemo', 'chyper', 'phyper',
'eclamp', 'incervix', 'pre4000', 'preterm', 'renal', 'rh', 'uterine',
'othermr', 'tobacco', 'alcohol', 'cigar6', 'drink5', 'crace',
'data_year', 'nprevistq', 'dfageq', 'feduc6',
'infant_id_1', 'dlivord_min', 'dtotord_min', 'bord_1',
'brstate_reg', 'stoccfipb_reg', 'mplbir_reg']
#Since data has pair property,processing the data to get separate row for each twin so that each child can be treated as an instance
data = []
for i in range(len(t.values)):
#select only if both <=2kg
if t.iloc[i].values[1]>=2000 or t.iloc[i].values[2]>=2000:
continue
this_instance_lighter = list(x.iloc[i][lighter_columns].values)
this_instance_heavier = list(x.iloc[i][heavier_columns].values)
#adding weight
this_instance_lighter.append(t.iloc[i].values[1])
this_instance_heavier.append(t.iloc[i].values[2])
#adding treatment, is_heavier
this_instance_lighter.append(0)
this_instance_heavier.append(1)
#adding the outcome
this_instance_lighter.append(y.iloc[i].values[1])
this_instance_heavier.append(y.iloc[i].values[2])
data.append(this_instance_lighter)
data.append(this_instance_heavier)
cols = [ 'pldel', 'birattnd', 'brstate', 'stoccfipb', 'mager8',
'ormoth', 'mrace', 'meduc6', 'dmar', 'mplbir', 'mpre5', 'adequacy',
'orfath', 'frace', 'birmon', 'gestat10', 'csex', 'anemia', 'cardiac',
'lung', 'diabetes', 'herpes', 'hydra', 'hemo', 'chyper', 'phyper',
'eclamp', 'incervix', 'pre4000', 'preterm', 'renal', 'rh', 'uterine',
'othermr', 'tobacco', 'alcohol', 'cigar6', 'drink5', 'crace',
'data_year', 'nprevistq', 'dfageq', 'feduc6',
'infant_id', 'dlivord_min', 'dtotord_min', 'bord',
'brstate_reg', 'stoccfipb_reg', 'mplbir_reg','wt','treatment','outcome']
df = pd.DataFrame(columns=cols,data=data)
df.head()
| pldel | birattnd | brstate | stoccfipb | mager8 | ormoth | mrace | meduc6 | dmar | mplbir | ... | infant_id | dlivord_min | dtotord_min | bord | brstate_reg | stoccfipb_reg | mplbir_reg | wt | treatment | outcome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 0.0 | 1.0 | 3.0 | 1.0 | 1.0 | ... | 35.0 | 3.0 | 3.0 | 2.0 | 5.0 | 5.0 | 5.0 | 936.0 | 0 | 0.0 |
| 1 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 0.0 | 1.0 | 3.0 | 1.0 | 1.0 | ... | 34.0 | 3.0 | 3.0 | 1.0 | 5.0 | 5.0 | 5.0 | 1006.0 | 1 | 0.0 |
| 2 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 0.0 | 1.0 | 2.0 | 0.0 | 1.0 | ... | 47.0 | NaN | NaN | NaN | 5.0 | 5.0 | 5.0 | 737.0 | 0 | 0.0 |
| 3 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 0.0 | 1.0 | 2.0 | 0.0 | 1.0 | ... | 46.0 | NaN | NaN | NaN | 5.0 | 5.0 | 5.0 | 850.0 | 1 | 1.0 |
| 4 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 0.0 | 1.0 | 3.0 | 1.0 | 1.0 | ... | 52.0 | 1.0 | 1.0 | 1.0 | 5.0 | 5.0 | 5.0 | 1830.0 | 0 | 0.0 |
5 rows × 53 columns
df = df.astype({"treatment":'bool'}, copy=False) #explicitly assigning treatment column as boolean
df.fillna(value=df.mean(),inplace=True) #filling the missing values
df.fillna(value=df.mode().loc[0],inplace=True)
data_1 = df[df["treatment"]==1]
data_0 = df[df["treatment"]==0]
print(np.mean(data_1["outcome"]))
print(np.mean(data_0["outcome"]))
print("ATE", np.mean(data_1["outcome"])- np.mean(data_0["outcome"]))
0.16421895861148197 0.1894192256341789 ATE -0.025200267022696926
1. Model
#The causal model has "treatment = is_heavier", "outcome = mortality" and "gestat10 = gestational weeks before birth"
model=CausalModel(
data = df,
treatment='treatment',
outcome='outcome',
common_causes='gestat10'
)
2. Identify
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
────────────(Expectation(outcome|gestat10))
d[treatment]
Estimand assumption 1, Unconfoundedness: If U→{treatment} and U→outcome then P(outcome|treatment,gestat10,U) = P(outcome|treatment,gestat10)
### Estimand : 2
Estimand name: iv
No such variable found!
### Estimand : 3
Estimand name: frontdoor
No such variable found!
3. Estimate Using Various Methods
3.1 Using Linear Regression
estimate = model.estimate_effect(identified_estimand,
method_name="backdoor.linear_regression", test_significance=True
)
print(estimate)
print("ATE", np.mean(data_1["outcome"])- np.mean(data_0["outcome"]))
print("Causal Estimate is " + str(estimate.value))
*** Causal Estimate ***
## Identified estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
────────────(Expectation(outcome|gestat10))
d[treatment]
Estimand assumption 1, Unconfoundedness: If U→{treatment} and U→outcome then P(outcome|treatment,gestat10,U) = P(outcome|treatment,gestat10)
## Realized estimand
b: outcome~treatment+gestat10
Target units: ate
## Estimate
Mean value: -0.025200267022700423
p-value: [7.18902894e-08]
ATE -0.025200267022696926
Causal Estimate is -0.025200267022700423
3.2 Using Propensity Score Matching
estimate = model.estimate_effect(identified_estimand,
method_name="backdoor.propensity_score_matching"
)
print("Causal Estimate is " + str(estimate.value))
print("ATE", np.mean(data_1["outcome"])- np.mean(data_0["outcome"]))
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs)
Causal Estimate is -0.012600133511348465 ATE -0.025200267022696926
4. Refute
4.1 Adding a random cause
refute_results=model.refute_estimate(identified_estimand, estimate,
method_name="random_common_cause")
print(refute_results)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs)
Refute: Add a Random Common Cause Estimated effect:-0.012600133511348465 New effect:-0.02728638184245661
4.2 Using a placebo treatment
res_placebo=model.refute_estimate(identified_estimand, estimate,
method_name="placebo_treatment_refuter", placebo_type="permute",
num_simulations=20)
print(res_placebo)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs)
Refute: Use a Placebo Treatment Estimated effect:-0.012600133511348465 New effect:-0.01735856141522029 p value:0.43718357128183105
4.3 Using a data subset refuter
res_subset=model.refute_estimate(identified_estimand, estimate,
method_name="data_subset_refuter", subset_fraction=0.9,
num_simulations=20)
print(res_subset)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs)
Refute: Use a subset of data Estimated effect:-0.012600133511348465 New effect:-0.04436048398312549 p value:0.3529814152616