DOCKSTRING: easy molecular docking yields better benchmarks for ligand design
Miguel Garc\'ia-Orteg\'on, Gregor N. C. Simm, Austin J. Tripp, Jos\'e, Miguel Hern\'andez-Lobato, Andreas Bender, Sergio Bacallado

TL;DR
DOCKSTRING provides an accessible, comprehensive toolkit and dataset for molecular docking, enabling better benchmarking and evaluation in ligand design for drug discovery.
Contribution
It introduces an open-source Python package, a large docking dataset, and benchmark tasks, making molecular docking more accessible and improving evaluation standards in drug discovery.
Findings
Docking scores outperform simple physicochemical properties as evaluation metrics.
The dataset includes over 260,000 docking poses and scores for 58 targets.
Docking-based benchmarks yield more realistic assessments for ligand design.
Abstract
The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate's interaction with the target. By contrast, molecular docking is a widely successful method in drug discovery to estimate binding affinities. However, docking simulations require a significant amount of domain knowledge to set up correctly which hampers adoption. To this end, we present DOCKSTRING, a bundle for meaningful and robust comparison of ML models consisting of three components: (1) an open-source Python package for straightforward computation of docking scores; (2) an extensive dataset of docking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics
