A Pilot Design for Observational Studies: Using Abundant Data Thoughtfully
Rachael C. Aikens (1, 2), Dylan Greaves (2), Michael Baiocchi (3), ((1) Stanford University Department of Biomedical Informatics, (2) Stanford, University Department of Statistics, (3) Stanford University Department of, Epidemiology, Population Health)

TL;DR
This paper introduces a pilot design approach for observational studies that uses initial data collection to inform and improve study design, enhancing the accuracy of treatment effect estimates and sensitivity analyses.
Contribution
It proposes a novel pilot design methodology that leverages early observational data to optimize study design and analysis strategies in data-rich, control-poor settings.
Findings
Pilot design reduces within-set heterogeneity.
Improves treatment effect estimation accuracy.
Enhances sensitivity analysis of unobserved confounding.
Abstract
Observational studies often benefit from an abundance of observational units. This can lead to studies that -- while challenged by issues of internal validity -- have inferences derived from sample sizes substantially larger than randomized controlled trials. But is the information provided by an observational unit best used in the analysis phase? We propose the use of `pilot design,' in which observations are expended in the design phase of the study, and the post-treatment information from these observations is used to improve study design. In modern observational studies, which are data rich but control poor, pilot designs can be used to gain information about the structure of post-treatment variation. This information can then be used to improve instrumental variable designs, propensity score matching, doubly-robust estimation, and other observational study designs. We illustrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
