Study design and the sampling of deleterious rare variants in biobank-scale datasets
Margaret C. Steiner, Daniel P. Rice, Arjun Biddanda, Mariadaria K. Ianni-Ravn, Christian Porras, John Novembre

TL;DR
This paper shows how the geographic diversity of genetic samples affects the discovery of rare harmful genetic variants in large-scale studies.
Contribution
The paper introduces a stochastic model and empirical validation to show how geographic breadth influences the discovery and frequency of deleterious rare variants.
Findings
Geographically broad samples discover more distinct rare variants compared to narrow samples.
Broad samples detect variants at lower average frequencies, often as singletons.
The effects of geographic breadth are stronger with larger sample sizes and stronger selection.
Abstract
As genetic studies grow, researchers are increasingly seeking to identify rare genetic variants with large impacts on traits. In this paper, we combine theoretical methods and data analysis to show how differences in sampling with respect to geographic location can influence the number and frequency of genetic variants that are found. Our results suggest that geographically broad samples will include more distinct genetic variants, though each variant will be found at a lower frequency, as compared to geographically narrow samples. Our results can help researchers to consider the implications of study design on expected results when constructing new genetic samples. One key component of study design in population genetics is the “geographic breadth” of a sample (i.e., how broad a region across which individuals are sampled). How the geographic breadth of a sample impacts observations…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Genomics and Rare Diseases · Cancer Genomics and Diagnostics
