Supervised Graph Contrastive Learning for Gene Regulatory Networks

Sho Oshima; Yuji Okamoto; Taisei Tosaki; Ryosuke Kojima

arXiv:2505.17786·cs.LG·February 20, 2026

Supervised Graph Contrastive Learning for Gene Regulatory Networks

Sho Oshima, Yuji Okamoto, Taisei Tosaki, Ryosuke Kojima

PDF

Open Access 3 Reviews

TL;DR

This paper introduces SupGCL, a supervised graph contrastive learning method that leverages real biological perturbations from gene knockdown experiments to improve gene regulatory network representations, outperforming existing methods.

Contribution

SupGCL is a novel GCL approach that incorporates biological perturbations as supervision, bridging artificial augmentations and real experimental data for enhanced network analysis.

Findings

01

SupGCL yields clearer disease-subtype structures in embeddings.

02

It improves clustering and downstream task performance.

03

Outperforms baseline methods on 13 gene and patient prediction tasks.

Abstract

Graph Contrastive Learning (GCL) is a powerful self-supervised learning framework that performs data augmentation through graph perturbations, with growing applications in the analysis of biological networks such as Gene Regulatory Networks (GRNs). The artificial perturbations commonly used in GCL, such as node dropping, induce structural changes that can diverge from biological reality. This concern has contributed to a broader trend in graph representation learning toward augmentation-free methods, which view such structural changes as problematic and should be avoided. However, this trend overlooks the fundamental insight that structural changes from biologically meaningful perturbations are not a problem to be avoided, but rather a rich source of information, thereby ignoring the valuable opportunity to leverage data from real biological experiments. Motivated by this insight, we…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1) Clear motivation for the approach of using teacher GRNs via real biological experimental data. The loss formulation is sound, and paper provides good intuition for it. The formulation is novel and an extension to GCL when we have access to such supervision data. 2) Paper is easy to follow and detailed. 3) Empirically shows SupGCL with low temperatures values surpasses GRACE (standard node level GCL) and with high temp values approaches its performance. This experiment / ablation partially s

Weaknesses

1) While the main motivation is valid, the loss formulation and use of supervision goes against the goal of self-supervised pre-training which standard GCL follows. The proposed formulation assumes availability of real world biological data (e.g., gene knockdown data). The authors don't discuss the ease / cost of obtaining them. The main benefits of standard GCL which is large scale self-supervision is lost with SupGCL. 2) Experiments suggest the improvement SupGCL obtains is very marginal acr

Reviewer 02Rating 6Confidence 4

Strengths

Novel Supervision Source: The use of gene knockdown data as a real-world supervisory signal for contrastive learning is both innovative and biologically meaningful. Theoretical Generalization: The framework extends GCL into a probabilistic supervised model, showing that prior unsupervised GCL methods are special cases. Theoretical proofs and ablation studies reinforce this claim. Experimental Rigor: The study includes comprehensive experiments across multiple tasks, cancer types, and baselines

Weaknesses

Limited Generalizability: As noted by the authors, SupGCL trained on one cancer type does not transfer effectively to others, limiting its broader biomedical applicability. Dependence on External Data: The framework’s reliance on knockdown data from LINCS constrains its use to settings where such experimental data exist, reducing scalability for rare or novel conditions. Modest Gains in Some Tasks: Although performance improvements are consistent, the magnitude of gains is modest in certain no

Reviewer 03Rating 4Confidence 2

Strengths

1. The authors' integration of gene regulatory networks with drug response prediction represents a promising interdisciplinary research direction. 2. The paper covers both graph-level and node-level tasks, demonstrating extensive experimentation. 3. The idea of using real gene knockdown data as supervision signals is forward-looking and innovative.

Weaknesses

1. The paper claims to have "theoretically proven that existing GCL methods are special cases of the proposed SupGCL." This statement is logically untenable. SupGCL is a highly domain-specific method whose core relies on GRNs as edge features and biological perturbation data as supervision signals. In contrast, general graph contrastive learning methods (e.g., GraphCL, GRACE) are designed to be universal and do not depend on such specific prior knowledge or external experimental data. 2. The pa

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks

MethodsContrastive Learning