Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials

Thorben Prein; Elton Pan; Janik Jehkul; Steffen Weinmann; Elsa A. Olivetti; Jennifer L. M. Rupp

arXiv:2506.12557·cond-mat.mtrl-sci·June 17, 2025

Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials

Thorben Prein, Elton Pan, Janik Jehkul, Steffen Weinmann, Elsa A. Olivetti, Jennifer L. M. Rupp

PDF

Open Access

TL;DR

This paper shows that off-the-shelf language models can effectively predict inorganic synthesis conditions and generate reaction recipes, significantly improving data efficiency and scalability in synthesis planning.

Contribution

It introduces a hybrid approach using language models for synthesis prediction and recipe generation, enhancing inorganic materials synthesis planning without task-specific fine-tuning.

Findings

01

Language models achieve up to 53.8% Top-1 accuracy in precursor prediction.

02

Ensembling reduces inference costs by up to 70%.

03

The approach improves temperature prediction errors by up to 8.7%.

Abstract

Inorganic synthesis planning currently relies primarily on heuristic approaches or machine-learning models trained on limited datasets, which constrains its generality. We demonstrate that language models, without task-specific fine-tuning, can recall synthesis conditions. Off-the-shelf models, such as GPT-4.1, Gemini 2.0 Flash and Llama 4 Maverick, achieve a Top-1 precursor-prediction accuracy of up to 53.8 % and a Top-5 performance of 66.1 % on a held-out set of 1,000 reactions. They also predict calcination and sintering temperatures with mean absolute errors below 126 {\deg}C, matching specialized regression methods. Ensembling these language models further enhances predictive accuracy and reduces inference cost per prediction by up to 70 %. We subsequently employ language models to generate 28,548 synthetic reaction recipes, which we combine with literature-mined examples to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science

MethodsDropout · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer · GPT-4 · Sparse Evolutionary Training