Suppose you are given some data with treatment and outcome. Can you determine whether the treatment causes the outcome, or the correlation is purely due to another common cause?
import os, sys
sys.path.append(os.path.abspath("../../../"))
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import dowhy
from dowhy import CausalModel
import dowhy.datasets, dowhy.plotter
import logging
logging.getLogger("dowhy").setLevel(logging.WARNING)
Creating the dataset. It is generated from either one of two models:
rvar = 1 if np.random.uniform() >0.5 else 0
data_dict = dowhy.datasets.xy_dataset(10000, effect=rvar,
num_common_causes=1,
sd_error=0.2)
df = data_dict['df']
print(df[["Action", "Outcome", "w0"]].head())
Action Outcome w0 0 3.113147 6.266040 -2.991157 1 9.870317 20.009844 3.995410 2 2.466091 6.030570 -3.130654 3 8.648457 18.564791 3.077105 4 5.903761 11.102452 -0.325468
dowhy.plotter.plot_treatment_outcome(df[data_dict["treatment_name"]], df[data_dict["outcome_name"]],
df[data_dict["time_val"]])
No handles with labels found to put in legend.
model= CausalModel(
data=df,
treatment=data_dict["treatment_name"],
outcome=data_dict["outcome_name"],
common_causes=data_dict["common_causes_names"],
instruments=data_dict["instrument_names"])
model.view_model(layout="dot")
WARNING:dowhy.causal_model:Causal Graph not provided. DoWhy will construct a graph based on data inputs.
Showing the causal model stored in the local file "causal_model.png"
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
Identify the causal effect using properties of the causal graph.
identified_estimand = model.identify_effect()
print(identified_estimand)
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor1 (Default)
Estimand expression:
d
─────────(Expectation(Outcome|w0))
d[Action]
Estimand assumption 1, Unconfoundedness: If U→{Action} and U→Outcome then P(Outcome|Action,w0,U) = P(Outcome|Action,w0)
### Estimand : 2
Estimand name: iv
No such variable found!
### Estimand : 3
Estimand name: frontdoor
No such variable found!
Once we have identified the estimand, we can use any statistical method to estimate the causal effect.
Let's use Linear Regression for simplicity.
estimate = model.estimate_effect(identified_estimand,
method_name="backdoor.linear_regression")
print("Causal Estimate is " + str(estimate.value))
# Plot Slope of line between treamtent and outcome =causal effect
dowhy.plotter.plot_causal_effect(estimate, df[data_dict["treatment_name"]], df[data_dict["outcome_name"]])
Causal Estimate is -0.010748140888287239
print("DoWhy estimate is " + str(estimate.value))
print ("Actual true causal effect was {0}".format(rvar))
DoWhy estimate is -0.010748140888287239 Actual true causal effect was 0
We can also refute the estimate to check its robustness to assumptions (aka sensitivity analysis, but on steroids).
res_random=model.refute_estimate(identified_estimand, estimate, method_name="random_common_cause")
print(res_random)
Refute: Add a Random Common Cause Estimated effect:-0.010748140888287239 New effect:-0.010632355420572281
res_placebo=model.refute_estimate(identified_estimand, estimate,
method_name="placebo_treatment_refuter", placebo_type="permute")
print(res_placebo)
Refute: Use a Placebo Treatment Estimated effect:-0.010748140888287239 New effect:5.139016302534216e-05 p value:0.48
res_subset=model.refute_estimate(identified_estimand, estimate,
method_name="data_subset_refuter", subset_fraction=0.9)
print(res_subset)
Refute: Use a subset of data Estimated effect:-0.010748140888287239 New effect:-0.010887492865577003 p value:0.5
As you can see, our causal estimator is robust to simple refutations.