LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery

Nikhil Abhyankar; Sanchit Kabra; Saaketh Desai; Chandan K. Reddy

arXiv:2510.22503·cs.LG·March 6, 2026

LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery

Nikhil Abhyankar, Sanchit Kabra, Saaketh Desai, Chandan K. Reddy

PDF

3 Reviews

TL;DR

LLEMA introduces a novel framework combining large language models, evolutionary algorithms, and chemistry-informed rules to efficiently discover multi-objective materials with high success rates and practical relevance.

Contribution

The paper presents LLEMA, a unified approach integrating LLMs, evolutionary strategies, and memory-based refinement for multi-objective materials discovery, outperforming existing methods.

Findings

01

LLEMA achieves higher hit rates across 14 diverse materials tasks.

02

The framework produces chemically plausible and thermodynamically stable candidates.

03

Ablation studies highlight the importance of rule-guided generation and surrogate models.

Abstract

Materials discovery requires navigating vast chemical and structural spaces while satisfying multiple, often conflicting, objectives. We present LLM-guided Evolution for MAterials discovery (LLEMA), a unified framework that couples the scientific knowledge embedded in large language models with chemistry-informed evolutionary rules and memory-based refinement. At each iteration, an LLM proposes crystallographically specified candidates under explicit property constraints; a surrogate-augmented oracle estimates physicochemical properties; and a multi-objective scorer updates success/failure memories to guide subsequent generations. Evaluated on 14 realistic tasks that span electronics, energy, coatings, optics, and aerospace, LLEMA discovers candidates that are chemically plausible, thermodynamically stable, and property-aligned, achieving higher hit rates and improved Pareto front…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. The work is a good demonstration of employing LLMs while injecting chemistry heuristics for 'multi-objective and synthesizability-aware materials design'. 2. The diverse benchmarking tasks are well designed which align with realistic multi-constraint targets

Weaknesses

1. No DFT validation provided on the top novel successful candidates for higher-fidelity stability evaluation. 2. The quality of the accepted or rejected designs of the pool is highly dependent on the reliability of surrogate models. Exploration is potentially limited due to the unintentional error cumulated, which the system would favor correct known designs. 3. The efficiency of the proposed framework is not fully explained.

Reviewer 02Rating 4Confidence 3

Strengths

1. The proposed method puts the LLM inside an iterative optimization loop (LLM $\rightarrow$ screen $\rightarrow$ refine $\rightarrow$ re-prompt). That aligns with current best practices in LLM-for-science/LLM-as-optimizer work and makes the contribution intelligible from an ML perspective. 2. The framework explicitly tries to stay inside chemically plausible regions via rule-based evolution and feasibility checks, rather than treating materials generation as free-text generation. That’s an imp

Weaknesses

1. Several recent works also use pretrained LLMs to propose materials/structures and then improve them using an external scorer (e.g., [1]). The paper needs to spell out what is actually new here. Right now, the novelty can look like “a solid engineering combination” rather than a clearly new algorithmic design. 2. The paper claims to “leverage scientific knowledge embedded in LLMs,” but it does not show: (1) performance when replacing the LLM with a lighter template-based (or rule-based) gener

Reviewer 03Rating 6Confidence 3

Strengths

- The proposed method combines LLM’s advantage of utilizing unstructured data sources with principled chemical knowledge guidance. - The ablation studies are comprehensive, in particular, the analysis of memorization vs guided exploration investigates a key question that LLM4Sci should answer.

Weaknesses

- The application scenarios are oversimplifications of materials design. For example, thermodynamic stability is the minimum requirement of chemical plausibility. - Some common methods for constrained multi-objective optimization are not included in the benchmark tests (see Q1).

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.