Expectation of the Site Frequency Spectrum
Alan R. Rogers, Stephen P. Wooding

TL;DR
This paper reviews the expected behavior of the site frequency spectrum in population genetics, explaining why adding more sequences extends the spectrum without altering existing entries under neutral conditions.
Contribution
It provides a detailed explanation of the reasons behind the invariance of the spectrum's entries when increasing sample size under neutrality.
Findings
Adding sequences extends the spectrum without changing existing entries.
The expected spectrum remains stable under neutrality and constant population size.
The paper clarifies the mathematical reasoning behind this phenomenon.
Abstract
The site frequency spectrum describes variation among a set of n DNA sequences. Its i'th entry (i=1,2,...,n-1) is the number of nucleotide sites at which the mutant allele is present in i copies. Under selective neutrality, random mating, and constant population size, the expected value of the spectrum is well known but somewhat puzzling. Each additional sequence added to a sample adds an entry to the end of the expected spectrum but does not affect existing entries. This note reviews the reasons for this behavior.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Gene expression and cancer classification · Fractal and DNA sequence analysis
