# MCLCBA: multi-view contrastive learning network for RNA methylation site prediction

**Authors:** Honglei Wang, Xuesong Zhang, Yanjing Sun, Zhaoyang Liu, Lin Zhang

PMC · DOI: 10.1186/s12859-025-06306-x · 2025-11-19

## TL;DR

This paper introduces MCLCBA, a new deep learning method for predicting RNA methylation sites that performs better than existing methods when training data is limited.

## Contribution

The novel MCLCBA framework uses multi-view contrastive learning with DNABERT and CGR to improve RNA methylation site prediction on small datasets.

## Key findings

- MCLCBA achieved 85.64% AUROC and 86.94% AUPRC on the m7G dataset.
- The method outperformed existing models by 5–6% in both metrics.
- Multi-view contrastive learning improves feature generalization with limited samples.

## Abstract

RNA methylation (RM) regulates gene expression regulation, RNA stability, and protein translation. Accurate prediction of RM modification sites is essential for understanding their biological functions. However, existing wet-lab detection techniques face challenges including operational complexity and high costs. Deep learning (DL) methods have been applied to this task. However, existing methods show performance degradation with smaller training datasets. For instance, the Bidirectional Gated Recurrent Unit (BGRU) demonstrates substantial performance degradation. Contrastive Learning Network (CNN) can extract local pattern features but learns overly specific patterns with sample-limited data, resulting in poor feature generalization. Bidirectional Long Short-Term Memory (BiLSTM) excels at modeling long-range dependencies but cannot sufficiently learn gating mechanism parameters to capture effective sequence representations with limited samples. Transformer processes sequences in parallel and captures global dependencies through self-attention, but its quadratic computational complexity and large parameter count make it prone to overfitting on small datasets. Current DL methods show reduced performance when training data is limited.

This study proposes a Multi-view Contrastive Learning with CNN-BiLSTM-Attention (MCLCBA) framework for RM modification site prediction. The multi-view approach comprises a primary view and auxiliary view, where the primary view utilizes DNA Bidirectional Encoder Representations from Transformers (DNABERT) to extract sequence contextual features, and the auxiliary view employs Chaos Game Representation (CGR) to extract structural features. Feature extraction includes four components: data augmentation, multi-view encoders, projection heads, and contrastive loss functions. By implementing dual differential data augmentation strategies and constructing multi-view network architectures for feature processing and fusion, the model learns discriminative feature representations invariant to data augmentation through maximizing positive sample similarity while minimizing negative sample similarity. This effectively addresses sample-limited feature learning scenarios. Experimental results on the sample-limited m7G dataset demonstrate that MCLCBA achieves AUROC and AUPRC of 85.64% and 86.94%, respectively, improving upon existing methods by 5–6% in both metrics.

Through multi-view contrastive learning, MCLCBA provides an approach for RM sites under sample-limited scenarios.

## Full-text entities

- **Genes:** MCC (MCC regulator of Wnt signaling pathway) [NCBI Gene 4163] {aka MCC1}, NINL (ninein like) [NCBI Gene 22981] {aka NLP}, TRNG (tRNA-Gly) [NCBI Gene 4563] {aka MTTG}
- **Diseases:** NCP (MESH:C566309), LLMs (MESH:D007806), DL (MESH:D007859), KMFE (MESH:C564021), RM (MESH:D012327), BiLSTM (MESH:D000088562)
- **Chemicals:** 5-methylcytosine (MESH:D044503), Nucleotide (MESH:D009711), 5-Methyluridine (MESH:C009182), 5-Methylcytidine (MESH:C016568), N6-methyladenosine (MESH:C010223), Inosine (MESH:D007288), 2'-O-methyladenosine (MESH:C024341), 7-Methylguanosine (MESH:C016578), O (MESH:D010100), uracil (MESH:D014498), thymine (MESH:D013941), Am (MESH:D000576), Cm (MESH:D003476), 2'-O-methylcytidine (MESH:C052203), 2'-O-methyluridine (MESH:C052202), ribose (MESH:D012266), 2'-O-Me[http (-), N7-methylguanine (MESH:C008450), 2'-O-methylguanosine (MESH:C024900), N1-methyladenosine (MESH:C002230), Pseudouridine (MESH:D011560)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Homo sapiens (human, species) [taxon 9606]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12628535/full.md

---
Source: https://tomesphere.com/paper/PMC12628535