An efficient computational chemistry approach to generating negative data for drug discovery pipeline validation
Stefan M. Ivanov

TL;DR
This paper introduces a new method to generate negative data for validating drug discovery pipelines without needing extra experiments.
Contribution
The novel approach generates negative data by randomizing ligands and creating isomers of known binders for pipeline validation.
Findings
Randomized ligands and isomers of known binders can generate vast, high-quality negative data for validation.
Such data closely match positive examples in molecular properties, making them ideal for testing pipeline performance.
Using this method can help distinguish effective drug discovery tools from ineffective ones.
Abstract
Modern virtual high-throughput screening (VHTS) pipelines can be suboptimally validated, with no rigorous studies conclusively demonstrating that every one of their steps reliably adds increasing enrichment atop the baseline random hit rate. Moreover, what little benchmarking studies are available primarily focus on the docking aspect of the pipelines, which is usually only the beginning or near the beginning, and even there, authors tend to use flawed data sets that artificially inflate performance metrics. Herein, we present an alternative method to pipeline validation and data set generation that requires no additional experimental work and expenditure, yet offers vast amounts of negative data that can be used in VHTS pipeline validation. By randomizing ligands across published experimental structures and generating structural isomers of known binders, practically unlimited amounts…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Cell Image Analysis Techniques · Chemistry and Chemical Engineering
