Semi-supervised learning and the question of true versus estimated propensity scores
Andrew Herren, P. Richard Hahn

TL;DR
This paper explores the use of semi-supervised learning for treatment effect estimation, examining whether true or estimated propensity scores are more effective, and proposes methods to reconcile existing paradoxes in causal inference.
Contribution
It introduces a simple procedure to reconcile the usefulness of known propensity scores with prior skepticism and compares the effectiveness of direct regression versus inverse-propensity weighting.
Findings
Estimated propensity scores can be more effective than true scores in some cases.
Direct regression often outperforms inverse-propensity weighting in simulations.
Unlabeled data can be valuable for estimating high-dimensional propensity functions.
Abstract
A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved. According to this formulation, large unlabeled data sets could be used to estimate a high dimensional propensity function and causal inference using a much smaller labeled data set could proceed via weighted estimators using the learned propensity scores. In the limiting case of infinite unlabeled data, one may estimate the high dimensional propensity function exactly. However, longstanding advice in the causal inference community suggests that estimated propensity scores (from labeled data alone) are actually preferable to true propensity scores, implying that the unlabeled data is actually useless in this context. In this paper we examine this paradox and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods and Bayesian Inference
MethodsCausal inference
