# Predicting RNA Structure Utilizing Attention from Pretrained Language Models

**Authors:** Ioannis Papazoglou, Alexios Chatzigoulas, George Tsekenis, Zoe Cournia

PMC · DOI: 10.1021/acs.jcim.4c02094 · Journal of Chemical Information and Modeling · 2025-07-02

## TL;DR

This paper explores using AI language models to predict RNA structures but finds they are not effective due to architectural limitations.

## Contribution

The study evaluates the suitability of pretrained nucleic acid language models for RNA structure prediction and identifies their architectural constraints.

## Key findings

- Current nucleic acid language models do not effectively capture RNA structural information.
- Architectural constraints limit the ability of these models to predict RNA secondary and tertiary structures.
- The study highlights the need for improved model designs to address RNA structure prediction challenges.

## Abstract

RNA possesses functional significance that extends beyond
the transport
of genetic information. The functional roles of noncoding RNA can
be mediated through their tertiary and secondary structure, and thus,
predicting RNA structure holds great promise for unleashing their
applications in diagnostics and therapeutics. However, predicting
the three-dimensional (3D) structure of RNA remains challenging. 
Applying artificial intelligence techniques in the context of natural
language processing and large language models (LLMs) could incorporate
evolutionary information to RNA 3D structure predictions and address
both resource and data scarcity limitations. This approach could achieve
faster inference times, while keeping similar accuracy outcomes compared
to employing time-consuming multiple sequence alignment schemes, akin
to its successful application in protein structure prediction. Herein,
we evaluate the suitability of currently available pretrained nucleic
acid language models (RNABERT, ERNIE-RNA, RNA Foundational Model (RNA-FM),
RiboNucleic Acid Language Model (RiNALMo), and DNABERT) to predict
secondary and tertiary RNA structures. We demonstrate that current
nucleic acid language models do not effectively capture structural
information, mainly due to architectural constraints.

## Full-text entities

- **Genes:** ncRNA [NCBI Gene 54719], LYZ (lysozyme) [NCBI Gene 4069] {aka AMYLD5, LYZF1, LZM}
- **Diseases:** LMs (MESH:D007806)
- **Chemicals:** amino acids (MESH:D000596)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12264945/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12264945/full.md

## References

78 references — full list in the complete paper: https://tomesphere.com/paper/PMC12264945/full.md

---
Source: https://tomesphere.com/paper/PMC12264945