Generalization of the Ewens sampling formula to arbitrary fitness landscapes
Pavel Khromov, Constantin D. Malliaris, Alexandre V. Morozov

TL;DR
This paper extends the Ewens sampling formula to complex fitness landscapes, enabling better inference of selection and evolutionary parameters from genomic data in populations with high allelic diversity.
Contribution
It introduces a generalized sampling formula for arbitrary fitness landscapes, including epistatic interactions, facilitating analysis of allelic diversity under selection.
Findings
Sampling probabilities are computationally tractable for landscapes with multiple fitness states.
The theory accurately infers selection coefficients from high-throughput sequencing data.
Allelic diversity can reveal signatures of selection even in complex fitness landscapes.
Abstract
In considering evolution of transcribed regions, regulatory modules, and other genomic loci of interest, we are often faced with a situation in which the number of allelic states greatly exceeds the population size. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from a population, do not change with time. In the absence of selection, probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary, possibly epistatic, fitness landscapes. Although our approach is general, we focus on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
