Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum
Jonathan Terhorst, Yun S. Song

TL;DR
This paper establishes fundamental information-theoretic limits on the accuracy of demographic history inference from the sample frequency spectrum, showing that increasing sample size does not necessarily improve estimation accuracy.
Contribution
It provides the first minimax error bounds for SFS-based demographic inference, revealing intrinsic limitations independent of sample size.
Findings
Minimax error for population size history estimation is at least O(1/ log s).
Increasing the number of sampled individuals does not reduce the error bound.
The results apply to populations with bottlenecks and are likely relevant to many natural populations.
Abstract
The sample frequency spectrum (SFS) of DNA sequences from a collection of individuals is a summary statistic which is commonly used for parametric inference in population genetics. Despite the popularity of SFS-based inference methods, currently little is known about the information-theoretic limit on the estimation accuracy as a function of sample size. Here, we show that using the SFS to estimate the size history of a population has a minimax error of at least , where is the number of independent segregating sites used in the analysis. This rate is exponentially worse than known convergence rates for many classical estimation problems in statistics. Another surprising aspect of our theoretical bound is that it does not depend on the dimension of the SFS, which is related to the number of sampled individuals. This means that, for a fixed number of segregating sites…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
