# Protein tandem repeats that produce frameshifts can generate new structural states and functions

**Authors:** Zarifa Osmanli, Gudrun Aldrian, Jeremy Leclercq, Theo Falgarone, Santiago M. Gómez Bergna, Denis N. Prada Gori, Andrew V. Oleinikov, Ilham Shahmuradov, Andrey V. Kajava

PMC · DOI: 10.1111/febs.70273 · 2025-09-24

## TL;DR

This paper shows that frameshifts in protein tandem repeats can create new protein structures and functions, potentially aiding evolution and causing disease.

## Contribution

The study reveals that frameshifts in tandem repeats lead to significant structural and functional changes, challenging previous assumptions about their effects.

## Key findings

- Frameshifts in short tandem repeats increase hydrophobicity and arginine content.
- Frameshifts can convert soluble proteins into membrane proteins and vice versa.
- Frameshift events generate novel protein structures and functions, contributing to adaptability and disease.

## Abstract

The genetic code uses three‐nucleotide units to encode each amino acid in proteins. Insertions or deletions of nucleotides not divisible by three shift the reading frames, resulting in significantly different protein sequences. These events are disruptive but can also create variability important for evolution. Previous studies suggested that the genetic code and gene sequences evolve to minimize frameshift effects, maintaining similar physicochemical properties to their reference proteins. Here, we focused on tandem repeat sequences, known as frameshift hotspots. Using cutting‐edge bioinformatics tools, we compared reference and frameshifted protein sequences within tandem repeats across 50 prokaryotic and eukaryotic proteomes. We showed that, in contrast to the general tendency, frameshifts within these regions, especially with short repeats, lead to a significant increase in hydrophobicity and arginine content. Additionally, the frameshifts, particularly in short tandem repeats, rearrange transmembrane regions, potentially converting soluble proteins into membrane proteins and vice versa. Given their occurrence in rapidly evolving, essential proteins, such changes may promote rapid adaptability. Our large‐scale alphafold modeling suggested that frameshift events can generate novel structures and functions, enabling the synthesis of multiple protein variants within the same coding region. Overall, frameshifts cause more drastic changes in tandem repeat sequences compared to non‐repetitive sequences and therefore can be a primary cause of altered functions, cellular localization, and the development of various pathologies.

We explored an alternative protein structure landscape by analyzing amino acid sequences from frameshifted tandem repeats—regions prone to frameshifts. These frameshifts, especially in short repeats, lead to more drastic changes than in non‐repetitive regions, often altering structure, function, localization, and potentially contributing to disease. Such frameshifts can also create novel protein variants from the same coding region.

## Full-text entities

- **Genes:** SNORA73A (small nucleolar RNA, H/ACA box 73A) [NCBI Gene 6080] {aka E1, E1-7, E1b, RNE1, RNU17A, U17A}, F2R (coagulation factor II thrombin receptor) [NCBI Gene 2149] {aka CF2R, HTR, PAR-1, PAR1, TR}, SNRNP70 (small nuclear ribonucleoprotein U1 subunit 70) [NCBI Gene 6625] {aka RNPU1Z, RPU1, SNRP70, Snp1, U1-70K, U170K}, CXADRP1 (CXADR pseudogene 1) [NCBI Gene 653108] {aka CAR, CXADRP}, ZNF781 (zinc finger protein 781 (pseudogene)) [NCBI Gene 163115], AP5B1 (adaptor related protein complex 5 subunit beta 1) [NCBI Gene 91056] {aka AP-5, PP1030}, RIEG2 (Rieger syndrome 2) [NCBI Gene 6012] {aka ARS, RGS2}, HMGB1 (high mobility group box 1) [NCBI Gene 3146] {aka HMG-1, HMG1, HMG3, SBP-1}
- **Diseases:** Alzheimer's disease (MESH:D000544), BPTAS (MESH:C537100), ALS (MESH:D000690), polydactyly (MESH:D017689), SLiM (MESH:D019247), amyloidosis (MESH:D000686), TR (MESH:D000083102), tibial aplasia/hypoplasia syndrome (MESH:C536482), cancer (MESH:D009369), brachyphalangy (MESH:C535432), cytotoxicity (MESH:D064420)
- **Chemicals:** lysine (MESH:D008239), Ser (MESH:D012694), Arg (MESH:D001120), Asp (MESH:D001224), Pro (MESH:D011392), Ala (MESH:D000409), poly-Arg (MESH:C015462), Pos (MESH:D011059), Asn (MESH:D001216), Glu (MESH:D018698), dipeptides (MESH:D004151), Trp (MESH:D014364), Leu (MESH:D007930), salt (MESH:D012492), AA (MESH:D000596), amino (-), Gln (MESH:D005973), cysteine (MESH:D003545), poly-Gln (MESH:C097188), Gly (MESH:D005998)
- **Species:** Dictyostelium discoideum (species) [taxon 44689], Drosophila melanogaster (fruit fly, species) [taxon 7227], Bos taurus (bovine, species) [taxon 9913], Mus musculus (house mouse, species) [taxon 10090], Plasmodium falciparum (malaria parasite P. falciparum, species) [taxon 5833], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Homo sapiens (human, species) [taxon 9606], Danio rerio (leopard danio, species) [taxon 7955], Sclerotinia sclerotiorum (species) [taxon 5180], Canis lupus (gray wolf, species) [taxon 9612], Pan troglodytes (chimpanzee, species) [taxon 9598]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12871928/full.md

---
Source: https://tomesphere.com/paper/PMC12871928