Computational strategic recruitment for representation and coverage studied in the All of Us Research Program
Victor A. Borza, Qingxia Chen, Ellen W. Clayton, Murat Kantarcioglu, Lina Sulieman, Yevgeniy Vorobeychik, Bradley A. Malin

TL;DR
This paper examines how well the All of Us Research Program represents diverse U.S. populations and introduces a new method to improve recruitment strategies.
Contribution
A computational strategic recruitment method is proposed to optimize representativeness and coverage in biomedical research.
Findings
All of Us recruited most understudied groups at or above their Census proportions.
The proposed method improves both cohort representativeness and coverage in simulations.
Improvements are consistent across various simulation conditions.
Abstract
Large scale data repositories like the All of Us Research Program are spurring new understanding of health and disease. All of Us aims to create a database of all Americans, addressing patterns of understudy of some groups in biomedical research. We study the representativeness (similarity to the U.S. population) and coverage (equality of proportion across U.S. Census demographic categories) of All of Us from 2017 to 2022, finding that All of Us recruited almost every understudied group at or above the group’s Census proportion. Building on the program’s successes, we propose a computational strategic recruitment method that optimizes multiple recruitment goals by allocating recruitment resources to sites and evaluate this method in recruitment simulation. We find that our methodology is indeed able to improve both cohort representativeness and coverage. Moreover, improvements in…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 11
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Health, Environment, Cognitive Aging · Data-Driven Disease Surveillance
