Informing Acquisition Functions via Foundation Models for Molecular Discovery
Qi Chen, Fabio Ramos, Al\'an Aspuru-Guzik, Florian Shkurti

TL;DR
This paper introduces a likelihood-free Bayesian Optimization approach that leverages foundation models and large language models to improve molecular discovery by enhancing scalability, robustness, and sample efficiency without explicit surrogate models.
Contribution
It proposes a novel likelihood-free BO method using foundation models to directly inform acquisition functions, incorporating tree-structured search and clustering for improved scalability.
Findings
Significantly improves scalability to large candidate sets.
Enhances robustness and sample efficiency in molecular discovery.
Outperforms traditional surrogate-based BO in experiments.
Abstract
Bayesian Optimization (BO) is a key methodology for accelerating molecular discovery by estimating the mapping from molecules to their properties while seeking the optimal candidate. Typically, BO iteratively updates a probabilistic surrogate model of this mapping and optimizes acquisition functions derived from the model to guide molecule selection. However, its performance is limited in low-data regimes with insufficient prior knowledge and vast candidate spaces. Large language models (LLMs) and chemistry foundation models offer rich priors to enhance BO, but high-dimensional features, costly in-context learning, and the computational burden of deep Bayesian surrogates hinder their full utilization. To address these challenges, we propose a likelihood-free BO method that bypasses explicit surrogate modeling and directly leverages priors from general LLMs and chemistry-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Machine Learning and Data Classification · Computational Drug Discovery Methods
