# GTMALoc: prediction of miRNA subcellular localization based on graph transformer and multi-head attention mechanism

**Authors:** Xindi Huang, Jipu Jiang, Lifen Shi, Cheng Yan

PMC · DOI: 10.3389/fgene.2025.1623008 · Frontiers in Genetics · 2025-06-19

## TL;DR

This paper introduces GTMALoc, a new model that predicts where microRNAs are located in cells using advanced machine learning techniques.

## Contribution

The novelty lies in combining graph transformers and multi-head attention with multiple biological networks for miRNA localization prediction.

## Key findings

- GTMALoc outperforms existing methods with an AUC of 0.9108 and AUPR of 0.8102.
- The model integrates sequence, functional, and association networks to improve prediction accuracy.
- It demonstrates strong generalization and stability across datasets.

## Abstract

MicroRNAs (miRNAs) play a crucial role in regulating gene expression, and their subcellular localization is essential for understanding their biological functions. However, accurately predicting miRNA subcellular localization remains a challenging task due to their short sequences, complex structures, and diverse functions. To improve prediction accuracy, this study proposes a novel model based on a graph transformer and a multi-head attention mechanism. The model integrates multi-source features which include the miRNA sequence similarity network, miRNA functional similarity network, miRNA–mRNA association network, miRNA–drug association network, and miRNA–disease association network. Specifically, we first apply the node2vec algorithm to extract features from these biological networks. Then, we use a graph transformer to capture relationships between nodes within the networks, enabling a better understanding of miRNA functions across different biological contexts. Next, a multi-head attention mechanism is implemented to combine miRNA features from multiple networks, allowing the model to capture deeper feature relationships and enhance prediction performance. Performance evaluation shows that the proposed method achieves significant improvements over current approaches on open-access datasets, achieving high performance with an AUC (area of receiver operating characteristic curve) of 0.9108 and AUPR(area of precision-recall curve) of 0.8102. It not only significantly improves prediction accuracy but also exhibits strong generalization and stability.

## Full-text entities

- **Genes:** GIP (gastric inhibitory polypeptide) [NCBI Gene 2695], MIR122 (microRNA 122) [NCBI Gene 406906] {aka MIR122A, MIRN122, MIRN122A, hsa-mir-122, miRNA122, miRNA122A}, MIR21 (microRNA 21) [NCBI Gene 406991] {aka MIRN21, hsa-mir-21, miR-21, miRNA21}
- **Diseases:** cardiovascular diseases (MESH:D002318), inflammatory (MESH:D007249), cancer (MESH:D009369), neurodegenerative disorders (MESH:D019636)
- **Chemicals:** cisplatin (MESH:D002945), lipid (MESH:D008055), gefitinib (MESH:D000077156), cholesterol (MESH:D002784)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12222170/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12222170/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12222170/full.md

---
Source: https://tomesphere.com/paper/PMC12222170