KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
Jiayu Qin, Zhengquan Luo, Guy Tadmor, Changyou Chen, David Zeevi, Zhiqiang Xu

TL;DR
This paper introduces KGOT, a framework combining knowledge graph aggregation and optimal transport pseudo-labeling to improve molecule-protein interaction prediction by leveraging heterogeneous biological data and addressing data scarcity.
Contribution
The novel integration of diverse biological datasets with optimal transport-based pseudo-labeling significantly enhances MPI prediction accuracy and zero-shot generalization.
Findings
Substantial improvements over state-of-the-art methods in prediction accuracy.
Enhanced zero-shot prediction capabilities for unseen interactions.
Effective utilization of heterogeneous biological data sources.
Abstract
Predicting molecule-protein interactions (MPIs) is a fundamental task in computational biology, with crucial applications in drug discovery and molecular function annotation. However, existing MPI models face two major challenges. First, the scarcity of labeled molecule-protein pairs significantly limits model performance, as available datasets capture only a small fraction of biological relevant interactions. Second, most methods rely solely on molecular and protein features, ignoring broader biological context such as genes, metabolic pathways, and functional annotations that could provide essential complementary information. To address these limitations, our framework first aggregates diverse biological datasets, including molecular, protein, genes and pathway-level interactions, and then develop an optimal transport-based approach to generate high-quality pseudo-labels for unlabeled…
Peer Reviews
Decision·ICLR 2026 Poster
The paper proposes a unified framework that aims to integrate biological entities such as pathways and genes.
(1) The proposed framework mainly combines existing methods without introducing any new modeling components or theoretical insights. (2) The performance is not compared with highly relevant works such as KG-MTL and BioKDN.
S1: This paper is to leverage large-scale multimodal knowledge graphs and propose an optimal transport-based pseudo-labeling strategy for the MPIs prediction. S2: In experiment part, the proposed KGOT outperforms existing MPIs prediction methods in terms of AUROC, early recognition metrics, and generalization to unseen interactions.
W1: A primary weakness of KGOT is its reliance on the critical yet potentially unstable step of pseudo-label generation via optimal transport. The quality of the entire approach hinges on the assumption that the optimal transport mechanism can accurately infer the underlying distribution of known interactions to assign reliable labels to unknown molecule-protein pairs. W2: Aggregating molecular, protein, gene, and pathway-level information requires sophisticated fusion techniques to handle the
1. The optimal transport-pseudo labelling strategy seems a good contribution to the overall field and opens the door to creating more sophisticated augmented knowledge graphs. 2. The results are shown with some measure of the performance dispersion, it is unclear which or where it is derived from, but it allows for some determination of the statistical significance of the results. 3. The ablation study is comprehensive and convincingly demonstrates that the different components of the method imp
1. It is unclear what the error represents in Tables 1 and 2.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Computational Drug Discovery Methods · Machine Learning in Bioinformatics
