The site frequency spectrum of dispensable genes
Franz Baumdicker

TL;DR
This paper extends classical population genetics models to account for dispensable genes in bacterial genomes, deriving a new joint gene and site frequency spectrum and highlighting biases in standard mutation rate estimators.
Contribution
It introduces a novel formula for the joint gene and site frequency spectrum considering dispensable genes, relaxing the assumption of all individuals carrying homologous genetic material.
Findings
Derived a formula for the expectation of the joint gene and site frequency spectrum.
Showed that standard estimators of mutation rate are biased for dispensable genes.
Demonstrated that the site frequency spectrum differs from classical models in dispensable genomes.
Abstract
The differences between DNA-sequences within a population are the basis to infer the ancestral relationship of the individuals. Within the classical infinitely many sites model, it is possible to estimate the mutation rate based on the site frequency spectrum, which is comprised by the numbers , where n is the sample size and is the number of site mutations (Single Nucleotide Polymorphisms, SNPs) which are seen in genomes. Classical results can be used to compare the observed site frequency spectrum with its neutral expectation, , where is the scaled site mutation rate. In this paper, we will relax the assumption of the infinitely many sites model that all individuals only carry homologous genetic material. Especially, it is today well-known that bacterial genomes have the ability to gain and lose genes, such that every single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
