Generative Chemical Language Models for Energetic Materials Discovery

Andrew Salij; R. Seaton Ullberg; Megan C. Davis; Marc J. Cawkwell; Christopher J. Snyder; Cristina Garcia Cardona; Ivana Matanovic; and Wilton J. M. Kort-Kamp

arXiv:2604.03304·physics.chem-ph·April 7, 2026

Generative Chemical Language Models for Energetic Materials Discovery

Andrew Salij, R. Seaton Ullberg, Megan C. Davis, Marc J. Cawkwell, Christopher J. Snyder, Cristina Garcia Cardona, Ivana Matanovic, and Wilton J. M. Kort-Kamp

PDF

TL;DR

This paper introduces generative chemical language models trained on large datasets, fine-tuned for energetic materials discovery, enabling accelerated design of high-performance compounds.

Contribution

It presents a transfer-learning framework applying molecular language models to energetic materials, extending beyond pharmacology and emphasizing fragment-based encodings.

Findings

01

Models can generate synthetically accessible energetic compounds.

02

Transfer learning improves model performance on specialized datasets.

03

Framework accelerates the discovery of next-generation energetic materials.

Abstract

The discovery of new energetic materials remains a pressing challenge hindered by limited availability of high-quality data. To address this, we have developed generative molecular language models that have been pretrained on extensive chemical data and then fine-tuned with curated energetic materials datasets. This transfer-learning strategy extends the chemical language model capabilities beyond the pharmacological space in which they have been predominantly developed, offering a framework applicable to other data-spare discovery problems. Furthermore, we discuss the benefits of fragment-based molecular encodings for chemical language models, in particular in constructing synthetically accessible structures. Together, these advances provide a foundation for accelerating the design of next-generation energetic materials with demanding performance requirements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.