The diversity of a distributed genome in bacterial populations
F. Baumdicker, W. R. Hess, P. Pfaffelhuber

TL;DR
This paper models the distribution of genes in bacterial populations using a coalescent framework, deriving statistical moments and fitting the model to marine cyanobacteria data to understand gene gain and loss dynamics.
Contribution
It introduces a novel infinitely many genes model incorporating gene gain and loss along ancestral lines using a Kingman coalescent framework.
Findings
Derived moments for gene diversity statistics
Model fits gene frequency data from marine cyanobacteria
Provides insights into gene gain and loss processes
Abstract
The distributed genome hypothesis states that the set of genes in a population of bacteria is distributed over all individuals that belong to the specific taxon. It implies that certain genes can be gained and lost from generation to generation. We use the random genealogy given by a Kingman coalescent in order to superimpose events of gene gain and loss along ancestral lines. Gene gains occur at a constant rate along ancestral lines. We assume that gained genes have never been present in the population before. Gene losses occur at a rate proportional to the number of genes present along the ancestral line. In this infinitely many genes model we derive moments for several statistics within a sample: the average number of genes per individual, the average number of genes differing between individuals, the number of incongruent pairs of genes, the total number of different genes in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
