# Deep learning framework for RNA 5hmC prediction using RNA language model embeddings

**Authors:** Md Muhaiminul Islam Nafi

PMC · DOI: 10.1371/journal.pone.0341649 · PLOS One · 2026-02-03

## TL;DR

This paper introduces a deep learning model for predicting RNA 5hmC modifications using language model embeddings, achieving high accuracy and outperforming existing methods.

## Contribution

The novel dual-branch model InTrans-RNA5hmC uses RiNALMo embeddings to predict RNA 5hmC with superior performance.

## Key findings

- InTrans-RNA5hmC achieved 0.97 sensitivity on the independent test set.
- The model outperformed state-of-the-art methods with 0.985 balanced accuracy and F1 score.
- RiNALMo embeddings provided effective contextual information for RNA 5hmC prediction.

## Abstract

By influencing gene expression and contributing to epigenetic modifications, Ribonucleic Acid (RNA) 5-Hydroxymethylcytosine (5hmC) modification significantly affects cellular pathways. It plays an important role in complex regulatory networks and gene expression. Moreover, 5hmC modifications are linked to a variety of human diseases, including diabetes, cancer, and cardiovascular conditions. However, experimental methods to identify RNA 5hmC modifications, such as chromatography and Polymerase Chain Reaction (PCR) amplification, are costly and time-consuming. So, computational methods are necessary to predict these modifications. In this study, several feature descriptors were analyzed and compared to finalize the best ones. Different deep-learning models were explored to design the proposed model architecture. Neighbourhood analysis was conducted on the dataset to provide insights into a deeper understanding of RNA 5hmC modifications. The proposed model, InTrans-RNA5hmC, is a dual-branch deep learning model that has two branches: the Inception branch and the Transformer branch. Word embeddings having the contextual information and language model embeddings from the RiboNucleic Acid Language Model (RiNALMo) were used as the finalized feature descriptors. InTrans-RNA5hmC outperformed existing SOTA methods, achieving 0.97 sensitivity, 0.985 balanced accuracy, and 0.985 F1 score on the Independent test set.

## Linked entities

- **Diseases:** diabetes (MONDO:0005015), cancer (MONDO:0004992)

## Full-text entities

- **Genes:** HNRNPA1 (heterogeneous nuclear ribonucleoprotein A1) [NCBI Gene 3178] {aka ALS19, ALS20, HNRPA1, HNRPA1L3, IBMPFD3, MPD3}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** cancer (MESH:D009369), diabetes (MESH:D003920), cardiovascular conditions (MESH:D002318)
- **Chemicals:** Cytosine (MESH:D003596), m1A (-), N7-Methylguanosine (MESH:C016578), Guanine (MESH:D006147), Dinucleotide (MESH:D015226), Uracil (MESH:D014498), N1-Methyladenine (MESH:C008407), Adenine (MESH:D000225), Nucleotide (MESH:D009711), 5-Hydroxymethylcytosine (MESH:C011865), 5-Methylcytosine (MESH:D044503), N6-Methyladenosine (MESH:C010223)
- **Species:** Homo sapiens (human, species) [taxon 9606], Drosophila melanogaster (fruit fly, species) [taxon 7227], Mus musculus (house mouse, species) [taxon 10090]
- **Mutations:** G nucleotides from positions 21-26, adenine at positions 21-26

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12867265/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12867265/full.md

## References

69 references — full list in the complete paper: https://tomesphere.com/paper/PMC12867265/full.md

---
Source: https://tomesphere.com/paper/PMC12867265