deCIFer: Crystal Structure Prediction from Powder Diffraction Data using Autoregressive Language Models
Frederik Lizak Johansen, Ulrik Friis-Jensen, Erik Bj{\o}rnager Dam, Kirsten Marie {\O}rnsbjerg Jensen, Roc\'io Mercado, Raghavendra Selvan

TL;DR
deCIFer is a novel autoregressive language model that predicts crystal structures from powder diffraction data, achieving high accuracy and robustness by incorporating real-world experimental artifacts.
Contribution
It introduces deCIFer, the first model to generate crystal structures directly from PXRD data using a language modeling approach, trained on nearly 2.3 million structures.
Findings
Achieves 94% structural match rate on synthetic datasets
Incorporates experimental artifacts like noise and peak broadening
Establishes a robust baseline for future experimental scenario modeling
Abstract
Novel materials drive advancements in fields ranging from energy storage to electronics, with crystal structure characterization forming a crucial yet challenging step in materials discovery. In this work, we introduce \emph{deCIFer}, an autoregressive language model designed for powder X-ray diffraction (PXRD)-conditioned crystal structure prediction (PXRD-CSP). Unlike traditional CSP methods that rely primarily on composition or symmetry constraints, deCIFer explicitly incorporates PXRD data, directly generating crystal structures in the widely adopted Crystallographic Information File (CIF) format. The model is trained on nearly 2.3 million crystal structures, with PXRD conditioning augmented by basic forms of synthetic experimental artifacts, specifically Gaussian noise and instrumental peak broadening, to reflect fundamental real-world conditions. Validated across diverse synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
