Multivariate phase-type theory for the site frequency spectrum
Asger Hobolth (1), Mogens Bladt (2), Lars N{\o}rvang Andersen (1) ((1), Aarhus University, (2) University of Copenhagen)

TL;DR
This paper introduces a multivariate phase-type approach to precisely characterize the distribution of linear functions of the site frequency spectrum, improving understanding and inference in population genetics.
Contribution
It develops a novel multivariate phase-type framework to analytically derive distributions of SFS-based estimators and tests, enhancing accuracy over simulation methods.
Findings
Classical mutation rate estimators follow a discrete phase-type distribution.
Neutrality tests are characterized by continuous multivariate phase-type distributions.
Provided an R package 'phasty' for implementing the methodology.
Abstract
Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajima's D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important for constructing confidence intervals for the estimators, and to determine significance thresholds for neutrality tests. These distributions are often approximated using simulation procedures. In this paper we use multivariate phase-type theory to specify, characterize and calculate the distribution of linear functions of the site frequency spectrum. In particular, we show that many of the classical estimators of the mutation rate are distributed according to a discrete phase-type distribution.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
