# G4mer: An RNA language model for transcriptome-wide identification of G-quadruplexes and disease variants from population-scale genetic data

**Authors:** Farica Zhuang, Danielle Gutman, Nathaniel Islas, Bryan B. Guzman, Alli Jimenez, San Jewell, Nicholas J. Hand, Katherine Nathanson, Daniel Dominguez, Yoseph Barash

PMC · DOI: 10.1038/s41467-025-65020-7 · Nature Communications · 2025-11-20

## TL;DR

G4mer is a new AI model that identifies RNA structures called G-quadruplexes and how genetic changes affect them, especially in breast cancer genes.

## Contribution

G4mer is an RNA language model that improves prediction of RNA G-quadruplexes and their disruption by genetic variants at a population scale.

## Key findings

- G4mer outperforms existing methods in predicting RNA G-quadruplexes and classifying their subtypes.
- Variants in breast cancer-associated genes were found to alter RNA G-quadruplex formation and affect gene expression.
- Sequence length and flanking motifs are key features influencing RNA G-quadruplex stability and mutational sensitivity.

## Abstract

RNA G-quadruplexes (rG4s) are key regulatory elements in gene expression, yet the effects of genetic variants on rG4 formation remain underexplored. Here, we introduce G4mer, an RNA language model that predicts rG4 formation, classifies rG4 subtypes, and evaluates the effects of genetic variants across the transcriptome. G4mer significantly improves accuracy over existing methods and uncovers subtype-specific differences in mutational sensitivity and evolutionary constraint, highlighting sequence length and flanking motifs as important rG4 features. Applying G4mer to \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$${5}^{{\prime} }$$\end{document}5′ untranslated region (UTR) variations, we identify variants in breast cancer-associated genes that alter rG4 formation and validate their impact on structure and gene expression. These results demonstrate the potential of integrating computational models with experimental approaches to study rG4 function, especially in diseases where non-coding variants are often overlooked. To support broader applications, G4mer is available as both a web tool and a downloadable model.

RNA G-quadruplexes (rG4s) are structures formed in guanine-rich regions of RNA that can serve as crucial regulatory elements in gene expression. Here the authors present an RNA language model for transcriptome-wide prediction of rG4s and genetic variants that disrupt or create them.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** breast cancer (MESH:D001943)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12635080/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12635080/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/PMC12635080/full.md

---
Source: https://tomesphere.com/paper/PMC12635080