# Patch type nucleotide sequence identities between genomes from many different species facilitate illegitimate recombination

**Authors:** Stefanie Weber, Christina M. Ramirez, Walter Doerfler

PMC · DOI: 10.1038/s41598-026-44124-0 · 2026-03-30

## TL;DR

This paper explores how patch-type nucleotide sequence identities across species may facilitate genome remodeling and evolution through illegitimate recombination.

## Contribution

The study proposes that patch-type sequence identities are statistical features of DNA that enable illegitimate recombination and evolutionary innovation.

## Key findings

- Patch-type identities of ~45% are found in diverse genomes and shuffled sequences, suggesting statistical rather than functional origin.
- These patterns may act as signals for illegitimate recombination, aiding DNA integration and rearrangement.
- Simulation data support that base composition can predict these identity patterns and their potential for recombination.

## Abstract

Comparative analyses of nucleotide sequences across diverse taxa, including viruses, bacteria, plants, and mammals, consistently reveal patch-type sequence identities of around 45%. These identities consist of short stretches interspersed by mismatches. Similarly, identity patterns emerge in alignments of randomized shuffled or scrambled sequences. These findings suggest patch-type identities reflect intrinsic statistical properties of the four-letter genetic alphabet. Such patterns likely function as recognition signals for illegitimate recombination, a mechanism that promotes sequence insertions, exchanges, and rearrangements without extensive homology. Patch-type identities have been observed at integration sites of foreign DNA and may play a role in evolutionary innovation and rapid diversification (e. g. SARS-CoV-2). Simulation data support the ideas that the frequency and length distribution of matching segments can be predicted by statistical models based on base composition, yet may also create local environments conducive to recombination. Further, the statistical architecture of the genetic alphabet encodes not only biological information, but also the potential for genome remodeling and adaptation during evolution. By bridging fundamental sequence properties with biological outcomes, this study provides a framework for exploring how randomness at the nucleotide sequence level can give rise to order and complexity across the tree of life.

The online version contains supplementary material available at 10.1038/s41598-026-44124-0.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)
- **Species:** Viruses (taxon 10239), Bacteria (taxon 2)

## Full-text entities

- **Genes:** S (surface glycoprotein) [NCBI Gene 43740568] {aka spike glycoprotein}, APRT (adenine phosphoribosyltransferase) [NCBI Gene 353] {aka AMP, APRTD}, APOE (apolipoprotein E) [NCBI Gene 348] {aka AD2, APO-E, ApoE4, LDLCQ5, LPG}, AD12 (Alzheimer disease 12) [NCBI Gene 100188830], NEU1 (neuraminidase 1) [NCBI Gene 4758] {aka NANH, NEU, SIAL1}
- **Diseases:** tumor (MESH:D009369)
- **Chemicals:** nucleotide (MESH:D009711), dinucleotide (MESH:D015226)
- **Species:** Sus scrofa (pig, species) [taxon 9823], Cylas formicarius (sweet potato weevil, species) [taxon 197179], Human endogenous retrovirus W (species) [taxon 87786], Human adenovirus 5 (no rank) [taxon 28285], Acidianus rod-shaped virus 1 (no rank) [taxon 309181], Zootoca vivipara (common lizard, species) [taxon 8524], Plasmodium falciparum (malaria parasite P. falciparum, species) [taxon 5833], Bombus pascuorum (species) [taxon 65598], Latimeria chalumnae (coelacanth, species) [taxon 7897], Lycium barbarum (Duke of Argyll's teatree, species) [taxon 112863], Human adenovirus 12 (no rank) [taxon 28282], Ilex aquifolium (English holly, species) [taxon 4298], Homo sapiens (human, species) [taxon 9606], Cricetus cricetus (black-bellied hamster, species) [taxon 10034], Mycobacterium tuberculosis (species) [taxon 1773], Oryza sativa (Asian cultivated rice, species) [taxon 4530], Mus musculus (house mouse, species) [taxon 10090], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Triticum aestivum (bread wheat, species) [taxon 4565], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Human adenovirus 2 (no rank) [taxon 10515]
- **Mutations:** A through F
- **Cell lines:** CLAC1 — Canis lupus familiaris (Dog), Canine lung adenocarcinoma, Cancer cell line (CVCL_J360), HA12/7 — Helicoverpa armigera (Cotton bollworm), Spontaneously immortalized cell line (CVCL_Z978), CBA-12-1-T — Rattus norvegicus (Rat), Rat malignant mesothelioma, Cancer cell line (CVCL_C1IN), BHK21 — Mesocricetus auratus (Golden hamster), Spontaneously immortalized cell line (CVCL_RQ70), -Hu- — Homo sapiens (Human), Finite cell line (CVCL_B0BH)

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13035915/full.md

---
Source: https://tomesphere.com/paper/PMC13035915