Informing Acquisition Functions via Foundation Models for Molecular Discovery

Qi Chen; Fabio Ramos; Al\'an Aspuru-Guzik; Florian Shkurti

arXiv:2512.13935·cs.LG·December 17, 2025

Informing Acquisition Functions via Foundation Models for Molecular Discovery

Qi Chen, Fabio Ramos, Al\'an Aspuru-Guzik, Florian Shkurti

PDF

Open Access

TL;DR

This paper introduces a likelihood-free Bayesian Optimization approach that leverages foundation models and large language models to improve molecular discovery by enhancing scalability, robustness, and sample efficiency without explicit surrogate models.

Contribution

It proposes a novel likelihood-free BO method using foundation models to directly inform acquisition functions, incorporating tree-structured search and clustering for improved scalability.

Findings

01

Significantly improves scalability to large candidate sets.

02

Enhances robustness and sample efficiency in molecular discovery.

03

Outperforms traditional surrogate-based BO in experiments.

Abstract

Bayesian Optimization (BO) is a key methodology for accelerating molecular discovery by estimating the mapping from molecules to their properties while seeking the optimal candidate. Typically, BO iteratively updates a probabilistic surrogate model of this mapping and optimizes acquisition functions derived from the model to guide molecule selection. However, its performance is limited in low-data regimes with insufficient prior knowledge and vast candidate spaces. Large language models (LLMs) and chemistry foundation models offer rich priors to enhance BO, but high-dimensional features, costly in-context learning, and the computational burden of deep Bayesian surrogates hinder their full utilization. To address these challenges, we propose a likelihood-free BO method that bypasses explicit surrogate modeling and directly leverages priors from general LLMs and chemistry-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Machine Learning and Data Classification · Computational Drug Discovery Methods