Precise Unbiased Estimation in Randomized Experiments using Auxiliary Observational Data
Johann A. Gagnon-Bartsch, Adam C. Sales, Edward Wu, Anthony F., Botelho, John A. Erickson, Luke W. Miratrix, Neil T. Heffernan

TL;DR
This paper proposes a method that combines observational data with randomized experiments using machine learning, achieving unbiased estimates with improved precision without risking confounding biases.
Contribution
It introduces a novel approach that leverages machine learning on observational data to enhance the accuracy of randomized experiment estimates while maintaining unbiasedness.
Findings
Method guarantees unbiased estimates regardless of machine learning model correctness
Integrates large observational data with small RCTs to improve precision
Applicable in education research with rich administrative data
Abstract
Randomized controlled trials (RCTs) are increasingly prevalent in education research, and are often regarded as a gold standard of causal inference. Two main virtues of randomized experiments are that they (1) do not suffer from confounding, thereby allowing for an unbiased estimate of an intervention's causal impact, and (2) allow for design-based inference, meaning that the physical act of randomization largely justifies the statistical assumptions made. However, RCT sample sizes are often small, leading to low precision; in many cases RCT estimates may be too imprecise to guide policy or inform science. Observational studies, by contrast, have strengths and weaknesses complementary to those of RCTs. Observational studies typically offer much larger sample sizes, but may suffer confounding. In many contexts, experimental and observational data exist side by side, allowing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSchool Choice and Performance · Advanced Causal Inference Techniques · Online Learning and Analytics
