DecoyDB: A Dataset for Graph Contrastive Learning in Protein-Ligand Binding Affinity Prediction
Yupu Zhang, Zelin Xu, Tingsong Xiao, Gustavo Seabra, Yanjun Li, Chenglong Li, Zhe Jiang

TL;DR
DecoyDB is a large, structure-aware dataset designed for graph contrastive learning in protein-ligand binding affinity prediction, enabling improved model pre-training and fine-tuning for drug discovery tasks.
Contribution
The paper introduces DecoyDB, a novel large-scale dataset with diverse decoy structures for self-supervised graph contrastive learning in protein-ligand affinity prediction.
Findings
Pre-trained models with DecoyDB outperform baselines in accuracy.
DecoyDB enhances label efficiency in affinity prediction.
Models generalize better across different datasets.
Abstract
Predicting the binding affinity of protein-ligand complexes plays a vital role in drug discovery. Unfortunately, progress has been hindered by the lack of large-scale and high-quality binding affinity labels. The widely used PDBbind dataset has fewer than 20K labeled complexes. Self-supervised learning, especially graph contrastive learning (GCL), provides a unique opportunity to break the barrier by pre-training graph neural network models based on vast unlabeled complexes and fine-tuning the models on much fewer labeled complexes. However, the problem faces unique challenges, including a lack of a comprehensive unlabeled dataset with well-defined positive/negative complex pairs and the need to design GCL algorithms that incorporate the unique characteristics of such data. To fill the gap, we propose DecoyDB, a large-scale, structure-aware dataset specifically designed for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputational Drug Discovery Methods · Advanced Graph Neural Networks · Protein Structure and Dynamics
