Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost Functions
Vladimir Golkov, Alexander Becker, Daniel T. Plop, Daniel, \v{C}uturilo, Neda Davoudi, Jeffrey Mendenhall, Rocco Moretti, Jens Meiler,, Daniel Cremers

TL;DR
This paper advocates for optimizing ROC-based cost functions in deep learning models for virtual screening, addressing challenges like class imbalance and decision thresholds, and introduces new training schemes that improve performance on drug discovery datasets.
Contribution
It introduces novel ROC-based training schemes and a logAUC cost function, enhancing deep learning model robustness and effectiveness in virtual screening tasks.
Findings
ROC optimization improves model robustness to class imbalance
New training schemes outperform standard methods on PubChem datasets
LogAUC cost function enhances early enrichment at high decision thresholds
Abstract
Computer-aided drug discovery is an essential component of modern drug development. Therein, deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features. Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets. In this work we argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance, its ability to compromise over different decision thresholds, certain freedom to influence the relative weights in this compromise, fidelity to typical benchmarking measures, and equivalence to positive/unlabeled learning. We also propose new training schemes (coherent mini-batch arrangement, and usage of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning and Data Classification
