Causal isotonic calibration for heterogeneous treatment effects
Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposed a new method for calibrating predictors of heterogeneous treatment effects Introduced a data-efficient variant of calibration that avoids the need for hold-out calibration sets Established that proposed method achieves fast doubly-robust calibration rates Wrapping proposed method around any black-box learning algorithm provides strong calibration guarantees while preserving predictive performance Paper Content Introduction Estimation of causal effects is important for understanding interventions and informing policy Treatment effect heterogeneity can provide more insights than overall population effects Applications of HTEs include prioritizing treatment and individualizing treatment assignments CATE estimation is of great interest in statistics and data science CATE estimators build upon estimators of the conditional mean outcome and the probability of treatment given covariates CATE estimation can be challenging due to non-smooth, high-dimensional nuisance parameters Predictions from a given treatment effect predictor can still be useful for decision-making Theoretical guarantees for rational decision-making typically hinge on the predictor being a good approximation of the true CATE Calibration is a desirable property of a treatment effect predictor Calibration has been widely used to enhance prediction models for classification and regression Little research has been done on calibration of treatment effect predictors This paper proposes a nonparametric doubly-robust method for calibrating treatment effect predictors Statistical setup Notation and definitions Data unit O consists of three components: W, A, and Y W is a vector of baseline covariates A is a binary indicator of treatment Y is an outcome Dn is the observed dataset π0 is the propensity score µ0 is the potential outcome Higher values of Y1-Y0 are desirable τ0 is the true CATE γ0 is the conditional mean of the individual treatment effect Solution to isotonic regression problem is non-unique Solution follows Groeneboom and Lopuhaa (1993) Measuring calibration and the calibration-distortion decomposition Various definitions of risk predictor calibration have been proposed Outline definition of calibration and rationale Best predictor of individual treatment effect is w → γ 0 (τ, w) Perfect calibration cannot be achieved in finite samples Calibration measure is 2 -expected calibration error Calibration measure plays role in mean squared error between treatment predictor and true CATE Calibration-distortion decomposition shows better-calibrated treatment effect predictors have lower mean-squared error Calibrating predictors: desiderata and classical methods Calibration methods aim to find a function that makes a predictor more accurate Platt’s scaling is used for binary outcomes and is based on strong parametric assumptions Histogram binning partitions the sorted values of the predictor into a fixed number of bins Bayesian binning considers multiple binning models and their combinations Isotonic calibration learns the bins from data using isotonic regression Isotonic calibration satisfies a distribution-free calibration guarantee and is at least as predictive as the original predictor Causal isotonic calibration Inspired by isotonic calibration, a doubly-robust calibration method for treatment effects is proposed, called causal isotonic calibration Takes a given predictor trained on some dataset and performs calibration using an independent (or hold-out) dataset Automatically learns uncalibrated regions of the given predictor Consolidates individual predictions within each region into a single value using a doubly-robust estimator of the ATE Introduces a novel data-efficient variant of calibration called crosscalibration Cross-fitted predictors are used and a single calibrated predictor is obtained using all available data Implemented using standard isotonic regression software Estimate χ 0 of χ 0 is obtained using E m Isotonic regression is used to find and refer to χ 0 (O) as a pseudo-outcome Calibrated predictor is given by θ n τ Sample splitting or cross-fitting is recommended to obtain pseudo-outcomes Algorithm 2 provides a means to fully utilize the entire dataset for both fitting an initial estimate of τ 0 and calibration Algorithm 3 is a computationally simpler variant of Algorithm 2 Sample theoretical properties Algorithm 1 and Algorithm 2 are presented for causal isotonic calibration Properties 1 and 2 are argued to be satisfied Data is split into a training dataset and a calibration dataset Conditions 1-5 are assumed Theorem 1 establishes the calibration rate of the calibrated predictor Theorem 2 states that the pointwise median preserves calibration Theorem 3 states that the mean squared error is not inflated much Data-generating mechanisms Examined behavior of proposal under two data-generating mechanisms Scenario 1: binary outcome, 4 confounders, treatment interactions Scenario 2: continuous outcome, linear on covariates, 20 true confounders Propensity score follows logistic regression model Covariates independent and uniformly distributed on (-1, +1) Sample sizes of 1,000, 2,000, 5,000 and 10,000 Cate estimation Implemented GBRT, RF, GLMnet, GAM, and MARS for Scenario 1 Implemented RF, GLMnet, and combination of variable screening with lasso regularization for Scenario 2 Used R package sl3 for implementation of estimators Used causal isotonic cross-calibration for calibration Performance metrics Compared performance of calibrated and uncalibrated versions of a causal isotonic calibrator Used 3 metrics to compare performance: calibration measure, mean squared error, and calibration bias within bins Estimated metric empirically using an independent sample of size 10,000 Averaged metric estimates across 500 simulations Simulation results GLMnet, RF, GAM, and MARS were well-calibrated and did not benefit from calibration GBRT benefited from calibration, reducing calibration error and improving mean squared error RF and GBRT with GLMnet screening were poorly calibrated and benefited from calibration Cross-calibration improved mean squared error and calibration more than conventional calibration Conclusion Proposed causal isotonic calibration as a novel method to calibrate treatment effect predictors Established that the pointwise median of calibrated predictors is also calibrated Developed a data-efficient variant of causal isotonic calibration using cross-fitted predictors Calibration error vanishes at a fast rate of -2/3 with little or no loss in predictive power Directly calibrate HTE predictors without requiring trial data or parametric assumptions Potential applications of method include data-driven decision-making with strong robustness guarantees Limitations: need to estimate µ 0 or π 0 sufficiently well Found that calibration generally preserves predictive power, and in some cases improves accuracy Found that cross-calibration substantially improved mean squared error Theoretical arguments can be adapted to provide guarantees for isotonic calibration in regression and classification problems Implementation of algorithms in R provided in Github package causalCalibration