Zimin patterns in genomes
Nikol Chantzi, Ioannis Mouratidis, Ilias Georgakopoulos-Soares

TL;DR
This study investigates Zimin patterns in genomes, revealing their distribution, enrichment, and depletion in various regions across multiple organisms, highlighting their biological significance and evolutionary dynamics.
Contribution
First comprehensive analysis of Zimin avoidmers in genomes, showing their distribution, enrichment, and evolutionary implications across diverse species.
Findings
Zimin avoidmers are absent in all k-mers above 104 base-pairs in the human genome.
They are enriched in coding and Satellite 1 regions.
Zimin avoidmers have lower insertion and deletion rates.
Abstract
Zimin words are words that have the same prefix and suffix. They are unavoidable patterns, with all sufficiently large strings encompassing them. Here, we examine for the first time the presence of k-mers not containing any Zimin patterns, defined hereafter as Zimin avoidmers, in the human genome. We report that in the reference human genome all k-mers above 104 base-pairs contain Zimin words. We find that Zimin avoidmers are most enriched in coding and Human Satellite 1 regions in the human genome. Zimin avoidmers display a depletion of germline insertions and deletions relative to surrounding genomic areas. We also apply our methodology in the genomes of another eight model organisms from all three domains of life, finding large differences in their Zimin avoidmer frequencies and their genomic localization preferences. We observe that Zimin avoidmers exhibit the highest genomic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies · Machine Learning in Bioinformatics
