Accelerating high-throughput virtual screening through molecular   pool-based active learning

David E. Graff; Eugene I. Shakhnovich; Connor W. Coley

arXiv:2012.07127·q-bio.QM·May 11, 2021

Accelerating high-throughput virtual screening through molecular pool-based active learning

David E. Graff, Eugene I. Shakhnovich, Connor W. Coley

PDF

1 Repo

TL;DR

This paper demonstrates that Bayesian optimization with surrogate models can significantly reduce the computational resources needed for large-scale virtual screening in drug discovery, identifying most top candidates with minimal evaluations.

Contribution

It evaluates various surrogate models and acquisition strategies, showing that model-guided screening can drastically cut down the number of required evaluations in large virtual libraries.

Findings

01

87.9% of top ligands found after testing only 2.4% of the library

02

Significant reduction in computational costs achieved

03

Model-guided search accelerates virtual screening campaigns

Abstract

Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of $1 0^{8}$ molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques can aid in their exploration: a surrogate structure-property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we assess various surrogate model architectures, acquisition functions, and acquisition batch sizes as applied to several protein-ligand docking datasets and observe significant reductions in computational costs, even when using a greedy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coleygroup/molpal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.