# Self-supervised learning on graphs predicts non-coding RNA and disease associations

**Authors:** Qingwen Wu, Sujuan Tang

PMC · DOI: 10.1038/s41598-026-36030-2 · Scientific Reports · 2026-01-14

## TL;DR

This paper introduces SSLGRDA, a self-supervised learning method that improves the prediction of non-coding RNA-disease associations using graph-based models.

## Contribution

SSLGRDA combines self-supervised learning and machine learning to robustly predict ncRNA-disease associations with high generalization.

## Key findings

- SSLGRDA outperforms state-of-the-art methods in predicting ncRNA-disease associations.
- The model demonstrates strong generalization across nine ncRNA-disease datasets.
- Case studies confirm SSLGRDA's ability to discover potential ncRNA-disease links.

## Abstract

Non-coding RNAs (ncRNAs) play crucial roles in regulating the initiation and progression of various cancers. Accurate identification disease-related ncRNAs would provide a unique opportunity to design better therapeutic interventions. Graph convolutional network-based methods have been proposed to identify potential ncRNA-disease associations (RDAs). However, some methods only use the graph structure and ignore the similarity information of nodes, and some methods integrate multi-source relation data which will introduce noise and have poor generalization. Learning robust node embeddings using graph convolutional network to build RDA predictive frameworks with high generalization remains a key challenge. We proposed a new RDA prediction scheme, SSLGRDA, composed of graph self-supervised learning and machine learning. Since SSLGRDA works on both heterogeneous and homogeneous graphs, we constructed ncRNA-disease heterogeneous graph based on known RDA, as well as heterogeneous graph based on known RDA, ncRNA similarities and disease similarities. For the latter, we ignored the node and edge types and treated it as homogeneous graph. Based on multiple contrastive or generate strategies, we used graph self-supervised learning to extract robust ncRNA and disease embedding to enhance the prediction ability and generalization of the model. Finally, we use machine learning methods to predict latent RDA probabilities. To evaluate model performance, we performed SSLGRDA on 9 ncRNA-disease datasets. Comprehensive experimental results show that SSLGRDA not only has good generalization, but also outperforms several state-of-the-art methods. Case studies on three ncRNA-disease datasets further demonstrate the ability of SSLGRDA in discovering potential ncRNA-disease associations.

The online version contains supplementary material available at 10.1038/s41598-026-36030-2.

## Full-text entities

- **Genes:** NcRNA [NCBI Gene 54719], IL19 (interleukin 19) [NCBI Gene 29949] {aka IL-10C, MDA1, NG.1, ZMDA1}, CDAN1 (codanin 1) [NCBI Gene 146059] {aka CDA1, CDAI, CDAN1A, DLT, PRO1295}, AICDA (activation induced cytidine deaminase) [NCBI Gene 57379] {aka AID, ARP2, CDA2, HEL-S-284, HIGM2}, CDA3 [NCBI Gene 981]
- **Diseases:** RD (MESH:D000077733), CDA (MESH:D004194), contrastive loss (MESH:D005119), Colon Cancer (MESH:D015179), Breast Cancer (MESH:D001943), cancers (MESH:D009369)
- **Chemicals:** MDA (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12881540/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12881540/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/PMC12881540/full.md

---
Source: https://tomesphere.com/paper/PMC12881540