Evolutionary dynamics of selfish DNA generates pseudo-linguistic features of genomes
Michael Sheinman, Anna Ramisch, Florian Massip, Peter F., Arndt

TL;DR
This paper investigates how selfish DNA elements, like Alu repeats, influence the statistical distribution of subsequences in genomes, revealing a scale-free power-law behavior similar to human language, and develops a model to explain this phenomenon.
Contribution
It introduces a model of selfish DNA expansion that explains the scale-free distribution of subsequences in genomes, linking evolutionary dynamics to genomic statistical features.
Findings
Selfish DNA elements dominate the power-law tail in sequence abundance.
The power-law exponent increases with subsequence length for Alu elements.
The model accurately predicts empirical distributions of genomic subsequences.
Abstract
Since the sequencing of large genomes, many statistical features of their sequences have been found. One intriguing feature is that certain subsequences are much more abundant than others. In fact, abundances of subsequences of a given length are distributed with a scale-free power-law tail, resembling properties of human texts, such as the Zipf's law. Despite recent efforts, the understanding of this phenomenon is still lacking. Here we find that selfish DNA elements, such as those belonging to the Alu family of repeats, dominate the power-law tail. Interestingly, for the Alu elements the power-law exponent increases with the length of the considered subsequences. Motivated by these observations, we develop a model of selfish DNA expansion. The predictions of this model qualitatively and quantitatively agree with the empirical observations. This allows us to estimate parameters for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
