RAILS: A Synthetic Sampling Weights for Volunteer-Based National Biobanks -- A Case Study with the All of Us Research Program
Huiding Chen, Andrew Guide, Lina Sulieman, Robert M Cronin, Thomas, Lumley, Qingxia Chen

TL;DR
This paper introduces RAILS, a method for creating synthetic sampling weights to improve the representativeness of volunteer-based biobanks, demonstrated through a case study with the All of Us Research Program.
Contribution
RAILS combines a pseudo-design-based model with a novel criterion to enhance calibration and reduce bias in non-probability samples, improving their utility for population inference.
Findings
RAILS reduces bias in prevalence estimates.
It improves efficiency and stability of estimates.
Application to All of Us shows effective bias reduction.
Abstract
While national biobanks are essential for advancing medical research, their non-probability sampling designs limit their representativeness of the target population. This paper proposes a method that leverages high-quality national surveys to create synthetic sampling weights for non-probabilistic cohort studies, aiming to improve representativeness. Specifically, we focus on deriving more accurate base weights, which enhance calibration by meeting population constraints, and on automating data-supported selection of cross-tabulations for calibration. This approach combines a pseudo-design-based model with a novel Last-In-First-Out criterion, enhancing the accuracy and stability of the estimates. Extensive simulations demonstrate that our method, named RAILS, reduces bias, improves efficiency, and strengthens inference compared to existing approaches. We apply the proposed method to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics in Clinical Research
