Distortion of genealogical properties when the sample is very large

Anand Bhaskar; Andrew G. Clark; Yun S. Song

arXiv:1308.0091·q-bio.PE·June 24, 2015

Distortion of genealogical properties when the sample is very large

Anand Bhaskar, Andrew G. Clark, Yun S. Song

PDF

TL;DR

As human genetic sample sizes grow large, this paper examines the limitations of the coalescent model, develops exact computations for the Wright-Fisher model, and proposes a hybrid approach to improve genealogical predictions.

Contribution

The paper introduces a hybrid algorithm combining Wright-Fisher and coalescent models to better approximate genealogical properties in large samples.

Findings

01

Significant multiple and simultaneous mergers in large samples under DTWF.

02

Noticeable differences in rare variant counts between models.

03

Hybrid method closely matches full DTWF predictions.

Abstract

Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands if not millions of individuals. In addition to posing computational challenges, such large sample sizes call for carefully re-examining the theoretical foundation underlying commonly-used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For realistic demographic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.