# Zimin patterns in genomes

**Authors:** Nikol Chantzi, Ioannis Mouratidis, Ilias Georgakopoulos-Soares, Yang Lu, Yang Lu, Yang Lu, Yang Lu

PMC · DOI: 10.1371/journal.pcbi.1013909 · PLOS Computational Biology · 2026-02-09

## TL;DR

This paper explores DNA sequences that avoid a specific repeating pattern called Zimin patterns, finding they are rare in long sequences and vary across species.

## Contribution

The study introduces and analyzes Zimin avoidmers, sequences that avoid Zimin patterns, in genomes across multiple species.

## Key findings

- Zimin avoidmers are most enriched in coding and Human Satellite 1 regions in the human genome.
- Zimin avoidmers show lower germline insertion and deletion rates compared to surrounding genomic areas.
- Prokaryotic organisms like E. coli have higher Zimin avoidmer density than eukaryotic organisms like D. rerio.

## Abstract

Zimin words are words that have the same prefix and suffix. They are unavoidable patterns, with all sufficiently large strings encompassing them. Here, we examine for the first time the presence of k-mers not containing any Zimin patterns, defined hereafter as Zimin avoidmers, in the human genome. We report that in the reference human genome all k-mers above 104 base-pairs contain Zimin words. We find that Zimin avoidmers are most enriched in coding and Human Satellite 1 regions in the human genome. Zimin avoidmers display a depletion of germline insertions and deletions relative to surrounding genomic areas. We also apply our methodology in the genomes of another eight model organisms from all three domains of life, finding large differences in their Zimin avoidmer frequencies and their genomic localization preferences. We observe that Zimin avoidmers exhibit the highest genomic density in prokaryotic organisms, with E. coli showing particularly high levels, while the lowest density is found in eukaryotic organisms, with D. rerio having the lowest. Among the studied genomes the longest k-mer length at which Zimin avoidmers are observed is that of S. cerevisiae at k-mer length of 115 base-pairs. We conclude that Zimin avoidmers display inhomogeneous distributions in organismal genomes, have intricate properties including lower insertion and deletion rates, and disappear faster than the theoretical expected k-mer length, across the organismal genomes studied.

In this study, we investigate a special type of DNA sequence that we call “Zimin avoidmers.” These are sequences that possess a unique property: they avoid a specific kind of self-embedded repetition known as a Zimin pattern. Because they lack this repeated structure, they function as an anti-pattern within the genome. This is particularly intriguing, as a known theorem guarantees that any sufficiently long DNA sequence must contain Zimin patterns. With this in mind, our goal is to characterize how often these pattern-free sequences appear, as well as to determine the maximum lengths they can reach in real genomes across both eukaryotic and prokaryotic organisms. We believe this framework offers a new lens through which to examine genome structure, and it may also prove useful for assessing the validity and behavior of synthetic genomes.

## Full-text entities

- **Species:** Danio rerio (leopard danio, species) [taxon 7955], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Escherichia coli (E. coli, species) [taxon 562], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12912701/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12912701/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12912701/full.md

---
Source: https://tomesphere.com/paper/PMC12912701