# Denoising self-supervised learning for disease-gene association prediction

**Authors:** Yan Zhang, Ju Xiang, Jianming Li

PMC · DOI: 10.1186/s12859-025-06281-3 · BMC Bioinformatics · 2025-10-23

## TL;DR

This paper introduces DGSL, a new method that improves disease-gene association predictions by capturing latent interactions and reducing noise in self-supervised learning.

## Contribution

DGSL introduces a denoising self-supervised learning approach that captures latent disease and gene interactions while reducing noise in embeddings.

## Key findings

- DGSL outperforms existing methods on benchmark datasets for disease-gene association prediction.
- The use of bipartite graphs and adaptive semantic alignment enhances the modeling of latent interactions.
- Cross-view denoising improves the accuracy of disease and gene embeddings.

## Abstract

Understanding the interplay between diseases and genes is crucial for gaining deeper insights into disease mechanisms and optimizing therapeutic strategies. In recent years, various computational methods have been developed to uncover potential disease-gene associations. However, existing computational approaches for disease-gene association prediction still face two major limitations. First, most current studies focus on constructing complex heterogeneous graphs using multi-dimensional biological entity relationships, while overlooking critical latent interaction patterns, namely, disease neighbor interactions and gene neighbor interactions—which are more valuable for association prediction. Second, in self-supervised learning (SSL), the presence of noise in auxiliary tasks commonly affects the accurate modeling of diseases and genes. In this study, we propose a novel denoising method for disease-gene association prediction, termed DGSL. To address the first issue, we utilize bipartite graphs corresponding to diseases and genes to derive disease-disease and gene-gene similarities, and further construct disease and gene interaction graphs to capture the latent interaction patterns. To tackle the second challenge, we implement cross-view denoising through adaptive semantic alignment in the embedding space, while preserving useful neighbor interactions. Extensive experiments on benchmark datasets demonstrate the effectiveness of our method.

The online version contains supplementary material available at 10.1186/s12859-025-06281-3.

## Full-text entities

- **Genes:** MAPT (microtubule associated protein tau) [NCBI Gene 4137] {aka DDPAC, FTD1, FTDP-17, MAPTL, MSTD, MTBT1}, GDF5 (growth differentiation factor 5) [NCBI Gene 8200] {aka BDA1C, BMP-14, BMP14, CDMP1, DUPANS, LAP-4}, AKT1 (AKT serine/threonine kinase 1) [NCBI Gene 207] {aka AKT, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA}, PIK3CB (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta) [NCBI Gene 5291] {aka P110BETA, PI3K, PI3KBETA, PIK3C1}, LMNA (lamin A/C) [NCBI Gene 4000] {aka CDCD1, CDDC, CMD1A, CMT2B1, EMD2, FPL}, SPTA1 (spectrin alpha, erythrocytic 1) [NCBI Gene 6708] {aka EL2, HPP, HS3, SPH3, SPTA}, FGFR2 (fibroblast growth factor receptor 2) [NCBI Gene 2263] {aka BBDS, BEK, BFR-1, CD332, CEK3, CFD1}
- **Diseases:** neurofibrillary tangle (MESH:D055956), Familial cancer of breast (MESH:D001943), Alzheimer (MESH:D000544), Prostate cancer (MESH:D011471), Alzheimer disease 2 (MESH:C536595), Diseases (MESH:D004194), neuronal degeneration (MESH:D009410), cancer (MESH:D009369), Lewy body dementia (MESH:D020961), Alzheimer disease 4 (MESH:C536596)
- **Chemicals:** DGSL (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12551132/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12551132/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12551132/full.md

---
Source: https://tomesphere.com/paper/PMC12551132