S$^2$Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening
Bowei He, Bowen Gao, Yankai Chen, Yanyan Lan, Chen Ma, Philip S. Yu, Ya-Qin Zhang, Wei-Ying Ma

TL;DR
S$^2$Drug is a novel two-stage contrastive learning framework that integrates protein sequences and 3D structures to improve virtual screening and binding site prediction in drug discovery.
Contribution
It introduces a sequence-structure fusion approach with a pretraining and fine-tuning strategy, addressing noise and redundancy in protein-ligand datasets.
Findings
Enhanced virtual screening accuracy across multiple benchmarks.
Improved binding site prediction performance.
Effective integration of sequence and structure information.
Abstract
Virtual screening (VS) is an essential task in drug discovery, focusing on the identification of small-molecule ligands that bind to specific protein pockets. Existing deep learning methods, from early regression models to recent contrastive learning approaches, primarily rely on structural data while overlooking protein sequences, which are more accessible and can enhance generalizability. However, directly integrating protein sequences poses challenges due to the redundancy and noise in large-scale protein-ligand datasets. To address these limitations, we propose \textbf{SDrug}, a two-stage framework that explicitly incorporates protein \textbf{S}equence information and 3D \textbf{S}tructure context in protein-ligand contrastive representation learning. In the first stage, we perform protein sequence pretraining on ChemBL using an ESM2-based backbone, combined with a tailored data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputational Drug Discovery Methods · Protein Structure and Dynamics · Machine Learning in Bioinformatics
