Propensity Score Methods for Merging Observational and Experimental Datasets
Evan Rosenman, Art B. Owen, Michael Baiocchi, Hailey Banack

TL;DR
This paper develops methods to combine limited randomized trial data with larger observational data using propensity scores, improving causal effect estimation by leveraging the strengths of both data sources.
Contribution
It introduces two novel methods for merging RCT and observational data based on propensity score stratification, enhancing robustness and external validity of causal estimates.
Findings
The spike-in method performs best when RCT and ODB covariates are similarly distributed.
The convex combination method is more robust to covariate bias.
Application to Women's Health Initiative data shows stable causal estimates of hormone therapy effects.
Abstract
This project considers how one might augment a limited amount of data from randomized controlled trial (RCT) with more plentiful data from an observational database (ODB), in order to estimate a causal effect. In our motivating setting, the ODB has better external validity, while the RCT has genuine randomization. We work with strata defined by the propensity score in the ODB. Subjects from the RCT are placed in strata defined by the propensity they would have had, had they been in the ODB. Our first method simply spikes the RCT data into their corresponding ODB strata. Our second method takes a data-driven convex combination of the ODB and RCT treatment effect estimates within each stratum. Using the delta method and simulations we show that the spike-in method works best when the RCT covariates are drawn from the same distribution as in the ODB. Our convex combination method is more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
