Bayesian Optimization in Chemical Compound Sub-Spaces using Low-Dimensional Molecular Descriptors
Yun-Wen Mao, Roman V. Krems

TL;DR
This paper introduces a Bayesian optimization framework that efficiently discovers molecules with targeted properties using low-dimensional descriptors and fewer than 2,000 data points, enabling practical small-data molecular design.
Contribution
The study presents a novel, data-efficient Bayesian optimization method with a reliable inverse mapping for discrete molecular structures, improving chemical compound optimization in low-data regimes.
Findings
Achieves 100% success rate in entropy optimization with fewer than 1,000 evaluations.
Over 80% success rate in ZPVE optimization for molecules with more than two heavy atoms.
Demonstrates effective inverse mapping from descriptor space to valid molecular structures.
Abstract
Efficient optimization of molecules with targeted properties remains a significant challenge due to the vast size and discrete nature of chemical compound space. Conventional machine-learning-based optimization approaches typically require large datasets to construct accurate surrogate models, limiting their applicability in data-scarce settings. In this study, we present a Bayesian optimization (BO) framework that identifies optimal molecular structures with high precision using fewer than 2,000 training data points within a chemical subspace containing more than 133,000 molecules. The framework employs a low-dimensional and physics-informed molecular descriptor vector that facilitates data-efficient surrogate modelling and optimization. A key innovation of the proposed framework is a reliable inverse mapping scheme that translates optimized points in the descriptor space back into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Gaussian Processes and Bayesian Inference
