Accelerating Black-Box Molecular Property Optimization by Adaptively Learning Sparse Subspaces
Farshud Sorourifar, Thomas Banker, Joel A. Paulson

TL;DR
This paper introduces a novel Bayesian optimization approach for molecular property optimization that adaptively learns sparse subspaces using numerical descriptors and Gaussian processes, significantly improving efficiency and accuracy.
Contribution
The work proposes combining numerical molecular descriptors with a sparse Gaussian process to better identify relevant subspaces, overcoming limitations of previous encoding-based methods.
Findings
Outperforms existing MPO methods on benchmarks and real-world problems.
Finds near-optimal molecules within 100 queries from over 100,000 candidates.
Demonstrates rapid identification of relevant subspaces for property modeling.
Abstract
Molecular property optimization (MPO) problems are inherently challenging since they are formulated over discrete, unstructured spaces and the labeling process involves expensive simulations or experiments, which fundamentally limits the amount of available data. Bayesian optimization (BO) is a powerful and popular framework for efficient optimization of noisy, black-box objective functions (e.g., measured property values), thus is a potentially attractive framework for MPO. To apply BO to MPO problems, one must select a structured molecular representation that enables construction of a probabilistic surrogate model. Many molecular representations have been developed, however, they are all high-dimensional, which introduces important challenges in the BO process -- mainly because the curse of dimensionality makes it difficult to define and perform inference over a suitable class of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Mass Spectrometry Techniques and Applications
MethodsSparse Evolutionary Training · Gaussian Process
