Zero-Shot Learning with Subsequence Reordering Pretraining for Compound-Protein Interaction
Hongzhi Zhang, Zhonglie Liu, Kun Meng, Jiameng Chen, Jia Wu, Bo Du, Di Lin, Yan Che, Wenbin Hu

TL;DR
This paper introduces a novel subsequence reordering pretraining method for protein representations to improve zero-shot compound-protein interaction prediction, especially in data-scarce scenarios, enhancing drug discovery processes.
Contribution
It proposes a new pretraining approach that explicitly models subsequence dependencies and uses length-variable augmentation to boost zero-shot CPI prediction performance.
Findings
Improves baseline CPI prediction accuracy in zero-shot scenarios.
Outperforms existing pretraining models in data-limited conditions.
Demonstrates effectiveness on real-world drug development tasks.
Abstract
Given the vastness of chemical space and the ongoing emergence of previously uncharacterized proteins, zero-shot compound-protein interaction (CPI) prediction better reflects the practical challenges and requirements of real-world drug development. Although existing methods perform adequately during certain CPI tasks, they still face the following challenges: (1) Representation learning from local or complete protein sequences often overlooks the complex interdependencies between subsequences, which are essential for predicting spatial structures and binding properties. (2) Dependence on large-scale or scarce multimodal protein datasets demands significant training data and computational resources, limiting scalability and efficiency. To address these challenges, we propose a novel approach that pretrains protein representations for CPI prediction tasks using subsequence reordering,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
