On the number of segregating sites
Helmut H. Pitters

TL;DR
This paper generalizes Watterson's classical results by deriving explicit formulas for all cumulants of the number of segregating sites in a sample, providing new insights into its distribution under the coalescent model.
Contribution
It extends the understanding of the distribution of segregating sites by computing all cumulants and linking them to the negative binomial distribution, offering new analytical tools.
Findings
Derived explicit cumulant formulas for segregating sites.
Established Law of Large Numbers and Central Limit Theorem for Sn.
Connected cumulants to polylogarithm and negative binomial distribution.
Abstract
Consider a sample of size n drawn from a large, neutral population of haploid individuals subject to mutation whose genealogy is governed by Kingmans n-coalescent. Let Sn count the number of segregating sites in this sample under the infinitely many sites model of Kimura. For fixed sample size n the main result about Sn is due to Watterson who computed its mean and variance. In our main result, Theorem 3, we generalize Watterson's result and compute the ith cumulant of Sn. We find in passing an explicit expression for the cumulants of the negative binomial distribution in terms of the polylogarithm. This seems to be the first explicit formula in the literature for the cumulant of arbitrary order of the negative binomial distribution. As an application of this result we obtain straightforward proofs of the Law of Large Numbers and the Central Limit Theorem for Sn.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and statistical mechanics · Bayesian Methods and Mixture Models · Evolution and Genetic Dynamics
