Approximate sampling formulae for general finite-alleles models of mutation
Anand Bhaskar, John A. Kamm, and Yun S. Song

TL;DR
This paper develops approximate sampling formulas for finite-alleles mutation models in genetics, providing highly accurate results at low mutation rates, filling a gap where exact formulas are unknown for general models.
Contribution
It introduces approximate closed-form sampling formulas for complex mutation models using an urn construction related to the coalescent, applicable to models with finitely many alleles.
Findings
Formulas are highly accurate at low mutation rates
Applicable to models with up to three or four observed allele types
Provides a practical tool where exact formulas are unavailable
Abstract
Many applications in genetic analyses utilize sampling distributions, which describe the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions under the coalescent have been known for many decades. However, no exact formula is currently known for more general models of mutation that are of biological interest. In this paper, models with finitely-many alleles are considered, and an urn construction related to the coalescent is used to derive approximate closed-form sampling formulas for an arbitrary irreducible recurrent mutation model or for a reversible recurrent mutation model, depending on whether the number of distinct observed allele types is at most three or four,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
