Robust and Efficient Semi-Supervised Estimation of Average Treatment Effects with Application to Electronic Health Records Data
David Cheng, Ashwin Ananthakrishnan, Tianxi Cai

TL;DR
This paper introduces a robust, semi-supervised method for estimating average treatment effects in electronic health records, leveraging limited labeled outcomes and predictive features to improve efficiency and robustness.
Contribution
It proposes an imputation-based ATE estimator that is doubly-robust and locally semiparametric efficient, specifically designed for semi-supervised EHR data analysis.
Findings
The method outperforms existing approaches in simulations.
It achieves robustness to model misspecification.
Application to EHR data demonstrates practical utility.
Abstract
We consider the problem of estimating the average treatment effect (ATE) in a semi-supervised learning setting, where a very small proportion of the entire set of observations are labeled with the true outcome but features predictive of the outcome are available among all observations. This problem arises, for example, when estimating treatment effects in electronic health records (EHR) data because gold-standard outcomes are often not directly observable from the records but are observed for a limited number of patients through small-scale manual chart review. We develop an imputation-based approach for estimating the ATE that is robust to misspecification of the imputation model. This effectively allows information from the predictive features to be safely leveraged to improve efficiency in estimating the ATE. The estimator is additionally doubly-robust in that it is consistent under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
