Towards Precision Protein-Ligand Affinity Prediction Benchmark: A Complete and Modification-Aware DAVIS Dataset
Ming-Hsiu Wu, Ziqian Xie, Shuiwang Ji, Degui Zhi

TL;DR
This paper introduces a comprehensive, modification-aware DAVIS dataset for protein-ligand affinity prediction, enabling more realistic benchmarking and improving model robustness to biological modifications in drug discovery.
Contribution
The authors curated a complete, modification-aware DAVIS dataset and proposed new benchmark settings to evaluate model robustness to protein modifications.
Findings
Docking-based models generalize better in zero-shot settings.
Docking-free models overfit to wild-type proteins but improve with fine-tuning.
The dataset enables evaluation of models under biologically realistic conditions.
Abstract
Advancements in AI for science unlocks capabilities for critical drug discovery tasks such as protein-ligand binding affinity prediction. However, current models overfit to existing oversimplified datasets that does not represent naturally occurring and biologically relevant proteins with modifications. In this work, we curate a complete and modification-aware version of the widely used DAVIS dataset by incorporating 4,032 kinase-ligand pairs involving substitutions, insertions, deletions, and phosphorylation events. This enriched dataset enables benchmarking of predictive models under biologically realistic conditions. Based on this new dataset, we propose three benchmark settings-Augmented Dataset Prediction, Wild-Type to Modification Generalization, and Few-Shot Modification Generalization-designed to assess model robustness in the presence of protein modifications. Through extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Bioinformatics · Protein Structure and Dynamics
