S$^2$Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening

Bowei He; Bowen Gao; Yankai Chen; Yanyan Lan; Chen Ma; Philip S. Yu; Ya-Qin Zhang; Wei-Ying Ma

arXiv:2511.07006·cs.LG·November 11, 2025

S$^2$Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening

Bowei He, Bowen Gao, Yankai Chen, Yanyan Lan, Chen Ma, Philip S. Yu, Ya-Qin Zhang, Wei-Ying Ma

PDF

Open Access 1 Video

TL;DR

S$^2$Drug is a novel two-stage contrastive learning framework that integrates protein sequences and 3D structures to improve virtual screening and binding site prediction in drug discovery.

Contribution

It introduces a sequence-structure fusion approach with a pretraining and fine-tuning strategy, addressing noise and redundancy in protein-ligand datasets.

Findings

01

Enhanced virtual screening accuracy across multiple benchmarks.

02

Improved binding site prediction performance.

03

Effective integration of sequence and structure information.

Abstract

Virtual screening (VS) is an essential task in drug discovery, focusing on the identification of small-molecule ligands that bind to specific protein pockets. Existing deep learning methods, from early regression models to recent contrastive learning approaches, primarily rely on structural data while overlooking protein sequences, which are more accessible and can enhance generalizability. However, directly integrating protein sequences poses challenges due to the redundancy and noise in large-scale protein-ligand datasets. To address these limitations, we propose \textbf{S $^{2}$ Drug}, a two-stage framework that explicitly incorporates protein \textbf{S}equence information and 3D \textbf{S}tructure context in protein-ligand contrastive representation learning. In the first stage, we perform protein sequence pretraining on ChemBL using an ESM2-based backbone, combined with a tailored data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

S²Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening· underline

Taxonomy

TopicsComputational Drug Discovery Methods · Protein Structure and Dynamics · Machine Learning in Bioinformatics