LICO: Large Language Models for In-Context Molecular Optimization
Tung Nguyen, Aditya Grover

TL;DR
LICO leverages large language models with specialized training to perform in-context molecular optimization, enabling effective predictions on unseen properties and achieving state-of-the-art results on benchmark tasks.
Contribution
The paper introduces LICO, a novel method that adapts large language models for black-box optimization in molecular domains through in-context training and prompting.
Findings
LICO performs competitively on the PMO benchmark.
LICO achieves state-of-the-art results on PMO-1K.
The approach generalizes to unseen molecule properties.
Abstract
Optimizing black-box functions is a fundamental problem in science and engineering. To solve this problem, many approaches learn a surrogate function that estimates the underlying objective from limited historical evaluations. Large Language Models (LLMs), with their strong pattern-matching capabilities via pretraining on vast amounts of data, stand out as a potential candidate for surrogate modeling. However, directly prompting a pretrained language model to produce predictions is not feasible in many scientific domains due to the scarcity of domain-specific data in the pretraining corpora and the challenges of articulating complex problems in natural language. In this work, we introduce LICO, a general-purpose model that extends arbitrary base LLMs for black-box optimization, with a particular application to the molecular domain. To achieve this, we equip the language model with a…
Peer Reviews
Decision·ICLR 2025 Poster
* the "domain adaption" trick to convert the general purpose text-based LLMs into domain-specific in-context learners using synthetic data. * detailed ablation studies. I really liked the analysis of the effect of the ratio of "intrinsic" and "synthetic" datasets. It gives an intuition on how to design synthetic datasets for other optimization tasks in other domains. * the "scaling law" chart (Fig. 3) is a good indicator of the scaling abilities of the proposed approach. Unfortunately there is a
The main weakness is that the SOTA claim on PMO is misleading. **The results reported in this paper are not really PMO.** There are two major differences. a) PMO has 23 tasks, not 21. *jnk3* and *gsk3* are missing. b) PMO uses 10K budget of oracle calls (as mentioned by the authors). While a) does not make the comparison to prior art unfair, b) is critical. The main advantage of the PMO paper was that the authors performed a large-scale hyperparameter search for every method they tried, and
1. The paper studies an important task of adapting LLMs for molecular optimization tasks, which has not been studied extensively. 2. The paper presents a novel approach by integrating LLMs with specialized layers to address black-box optimization problems in the molecular domain. 3. The model achieves strong performance on the challenging PMO benchmark.
1. While this paper demonstrates the strong performance of LLMs, the analysis of their specific benefits remains limited. It would be valuable to understand which particular characteristics of LLMs contribute to the success of this molecular optimization task. For instance, how do different LLM architectures or configurations impact performance? Would domain-adaptive training on chemistry corpora further enhance results? Expanding on these points with additional explanation would strengthen the
-The overall approach shows good efficacy, as shown in the benchmark scores. The combo of training data and modeling recipe is interesting and novel for surrogate modeling, as far as I know (this needs cross-confirmation). -The scaling law analysis is interesting, indicating the power of scaling up the model size for better molecule optimization outcome. -The ablation study and result analysis is helpful: the analysis of surrogate modeling accuracy, vs. the GPR baseline, confirms the efficacy
-The authors selectively show the numbers with a different sampling budget (1k instead of 10k in the original PMO setting) with a reason. Can they also present the numbers with different sampling budget in the supplementary information? That will confirm the generalization of the proposed approach. -The ablation and baseline should be more comprehensive: there are several concurrent works for LLM for molecular optimization[1,2], the authors should also add them as a baseline, if applicable, and
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods
MethodsSparse Evolutionary Training · Balanced Selection
