Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs

Othmane Kabal; Mounira Harzallah; Fabrice Guillet; Hideaki Takeda; Ryutaro Ichise

arXiv:2605.05463·cs.LG·May 8, 2026

Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs

Othmane Kabal, Mounira Harzallah, Fabrice Guillet, Hideaki Takeda, Ryutaro Ichise

PDF

1 Repo

TL;DR

This paper evaluates the robustness of Graph Self-Supervised Learning methods on noisy, text-derived biomedical graphs, proposing a framework that improves performance and offers practical guidance for real-world applications.

Contribution

It introduces NATD-GSSL, a comprehensive framework for GSSL on noisy biomedical graphs, and provides the first systematic robustness analysis in this context.

Findings

01

Relation reconstruction is highly sensitive to noise but benefits from schemas.

02

Feature reconstruction remains robust and comparable to clean graphs.

03

Bidirectional GNNs outperform unidirectional ones on noisy graphs.

Abstract

Graph Self-Supervised Learning (GSSL) offers a powerful paradigm for learning graph representations without labeled data. However, existing work assumes clean, manually curated graphs. Recent advances in NLP enable the large-scale automatic extraction of knowledge graphs from text, opening new opportunities for GSSL while introducing substantial real-world noise. This type of noise remains largely unexplored, as prior robustness studies typically rely on synthetic perturbations. To address this gap, we present the first comprehensive evaluation of GSSL methods on text-driven graphs for unsupervised term typing. We introduce Noise-Aware Text-Driven Graph GSSL (NATD-GSSL), a unified framework that combines automatic graph construction, graph refinement, and GSSL. Our evaluation follows a dual-graph protocol that contrasts a noisy graph derived from MedMentions with a clean Unified Medical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

OthmaneKabal/MC2GAE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.