Zero-Shot Learning with Subsequence Reordering Pretraining for Compound-Protein Interaction

Hongzhi Zhang; Zhonglie Liu; Kun Meng; Jiameng Chen; Jia Wu; Bo Du; Di Lin; Yan Che; Wenbin Hu

arXiv:2507.20925·cs.LG·July 29, 2025

Zero-Shot Learning with Subsequence Reordering Pretraining for Compound-Protein Interaction

Hongzhi Zhang, Zhonglie Liu, Kun Meng, Jiameng Chen, Jia Wu, Bo Du, Di Lin, Yan Che, Wenbin Hu

PDF

TL;DR

This paper introduces a novel subsequence reordering pretraining method for protein representations to improve zero-shot compound-protein interaction prediction, especially in data-scarce scenarios, enhancing drug discovery processes.

Contribution

It proposes a new pretraining approach that explicitly models subsequence dependencies and uses length-variable augmentation to boost zero-shot CPI prediction performance.

Findings

01

Improves baseline CPI prediction accuracy in zero-shot scenarios.

02

Outperforms existing pretraining models in data-limited conditions.

03

Demonstrates effectiveness on real-world drug development tasks.

Abstract

Given the vastness of chemical space and the ongoing emergence of previously uncharacterized proteins, zero-shot compound-protein interaction (CPI) prediction better reflects the practical challenges and requirements of real-world drug development. Although existing methods perform adequately during certain CPI tasks, they still face the following challenges: (1) Representation learning from local or complete protein sequences often overlooks the complex interdependencies between subsequences, which are essential for predicting spatial structures and binding properties. (2) Dependence on large-scale or scarce multimodal protein datasets demands significant training data and computational resources, limiting scalability and efficiency. To address these challenges, we propose a novel approach that pretrains protein representations for CPI prediction tasks using subsequence reordering,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.