Geometry of the sample frequency spectrum and the perils of demographic inference
Zvi Rosen, Anand Bhaskar, Sebastien Roch, Yun S. Song

TL;DR
This paper investigates the geometric structure of the expected sample frequency spectrum in population genetics, revealing why popular demographic inference methods often produce unstable or degenerate results due to the inherent geometry of the problem.
Contribution
It provides a geometric characterization of the expected SFS for piecewise-constant demographies and explains the intrinsic causes of pathological inference behaviors.
Findings
Expected SFS can be reconstructed with limited epochs.
Set of expected SFS is open and non-convex for fewer epochs.
Pathological inference behaviors are intrinsic to the geometry of the SFS.
Abstract
The sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however. Specifically, they often display runaway behavior in optimization, where the inferred population sizes and epoch durations can degenerate to 0 or diverge to infinity, and show undesirable sensitivity of the inferred demography to perturbations in the data. The goal of this paper is to provide theoretical insights into why such problems arise. To this end, we characterize the geometry of the expected SFS for piecewise-constant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
