# PathoRM: Computational inference of pathogenic RNA methylation sites by incorporating multi-view features

**Authors:** Hui Liu, Jiani Ma, Xianjun Ma, Lin Zhang, Aya Narunsky, Aya Narunsky, Aya Narunsky, Aya Narunsky

PMC · DOI: 10.1371/journal.pcbi.1013654 · 2025-11-10

## TL;DR

PathoRM is a deep learning model that identifies RNA methylation sites linked to diseases by combining biological data and advanced machine learning techniques.

## Contribution

PathoRM introduces a novel deep learning framework integrating multi-view features and biological insights for accurate inference of pathogenic RNA methylation sites.

## Key findings

- PathoRM achieves robust performance in predicting pathogenic RNA methylation sites across multiple datasets.
- The model identifies conserved motifs in RNA methylation host sequences, providing biological interpretability.
- PathoRM captures intrinsic pathogenic regions without explicit annotations, enhancing genome research.

## Abstract

Identifying pathogenic RNA methylation sites with a reasonable biological explanation has important implications for the treatment of diseases. Due to the limitations of in vitro experiments in identifying pathogenic RNA methylation sites, there is a growing need for computational workflows to enable accurate inference. Here, motivated by this profound meaning, we developed PathoRM, a biologically informed deep learning model, to infer associations between RNA methylation sites and diseases. PathoRM could provide convincing pathogenic RNA methylation sites and unravel the enigma of pathology in the epi-transcriptomic layer. PathoRM fuses RNA methylation host sequences and pathogenic descriptions as inputs, and subsequently employs large language models, multi-view learning algorithm, graph neural networks, an adversarial training approach, and “guilty-by-association”-derived negative sampling approach. PathoRM distils the semantically enriched feature embeddings, leading to more accurate and robust prediction performance across the metrics and datasets. Notably, incorporated with attention mechanism, PathoRM bestows itself biological interpretability through illuminating the dark matters in the host sequences of RNA methylation sites. This work is expected to assist in the discovery of pathogenic RNA methylation sites and conserved motifs, contributing to the advancement of genome research. Codes and pre-trained model are accessible at https://github.com/jianiM/PathoRM.

RNA methylation (RM) is a pivotal epi-transcriptomic modification that alters RNA nucleotides through methyl group additions, profoundly impacting gene expression, cellular differentiation, and essential biological processes crucial for maintaining cellular function. Dysregulation of RM is intricately linked to various diseases. Given the challenges associated with identifying pathogenic sites through laboratory examination, there is an urgent need for computational workflows capable of accurately inferring pathogenic RNA methylation sites to facilitate comprehensive biological investigations. Here, we developed PathoRM, a biologically informed deep learning model aimed at elucidating the associations among RNA methylation sites and diseases. PathoRM integrates RNA methylation host sequences and pathogenic descriptions using large language models, multi-view learning algorithms, graph neural networks, adversarial training, and a negative sampling method derived from “guilty-by-association” principles. By distilling semantically enriched feature embeddings, PathoRM achieves promising predictive accuracy and robustness across diverse metrics and datasets. Notably, even without explicit annotations for sites, PathoRM can capture the intrinsic pathogenic regions, which is overlapped with the conserved motif, in the RM host sequence, offering biological insights into the decision-making procedure.

## Full-text entities

- **Genes:** TTC41P (tetratricopeptide repeat domain 41, pseudogene) [NCBI Gene 253724] {aka GNN, GNNP}
- **Diseases:** CDD (MESH:C567275), RM (MESH:D012327), cancer (MESH:D009369), GNNs (MESH:D015441), neurological disorders (MESH:D009461), Alzheimer's (MESH:D000544), disease (MESH:D004194), cardiovascular diseases (MESH:D002318), metabolic disorders (MESH:D008659), breast cancer (MESH:D001943), m6ADA (MESH:C535673)
- **Chemicals:** 2-O (-), N7-methylguanosine (MESH:C016578), N6-methyladenosine (MESH:C010223), m6A (MESH:C005955), 5-methylcytidine (MESH:C016568), nucleotide (MESH:D009711)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

45 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12617926/full.md

---
Source: https://tomesphere.com/paper/PMC12617926