Pre-trained knowledge elevates large language models beyond traditional chemical reaction optimizers
Robert MacKnight, Jose Emilio Regio, Jeffrey G. Ethier, Luke A. Baldwin, and Gabe Gomes

TL;DR
Pre-trained large language models significantly improve chemical reaction optimization by enabling more effective exploration of complex, high-dimensional parameter spaces, surpassing traditional Bayesian methods especially in solution-scarce scenarios.
Contribution
This work demonstrates that pre-trained knowledge in LLMs enhances optimization in chemistry, providing a new paradigm that outperforms Bayesian optimization in complex categorical spaces.
Findings
LLMs match or outperform Bayesian optimization in most datasets.
LLMs maintain higher exploration entropy than Bayesian methods.
Pre-trained domain knowledge enables better navigation of chemical parameter space.
Abstract
Modern optimization in experimental chemistry employs algorithmic search through black-box parameter spaces. Here we demonstrate that pre-trained knowledge in large language models (LLMs) fundamentally changes this paradigm. Using six fully enumerated categorical reaction datasets (768-5,684 experiments), we benchmark LLM-guided optimization (LLM-GO) against Bayesian optimization (BO) and random sampling. Frontier LLMs consistently match or exceed BO performance across five single-objective datasets, with advantages growing as parameter complexity increases and high-performing conditions become scarce (<5% of space). BO retains superiority only for explicit multi-objective trade-offs. To understand these contrasting behaviors, we introduce a topology-agnostic information theory framework quantifying sampling diversity throughout optimization campaigns. This analysis reveals that LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
