AI-Driven Expansion and Application of the Alexandria Database

Th\'eo Cavignac (1); Jonathan Schmidt (2); Pierre-Paul De Breuck (1); Antoine Loew (1); Tiago F. T. Cerqueira (3); Hai-Chen Wang (1); Anton Bochkarev (4); Yury Lysogorskiy (4); Aldo H. Romero (5); Ralf Drautz (4); Silvana Botti (1); Miguel A. L. Marques (1) ((1) Research Center Future Energy Materials; Systems of the University Alliance Ruhr; ICAMS; Ruhr University Bochum; Bochum; Germany; (2) Department of Materials; ETH Z\"urich; Z\"urich; Switzerland; (3) CFisUC; Department of Physics; University of Coimbra; Coimbra; Portugal; (4) ICAMS; Ruhr-Universit\"at Bochum; ACEworks GmbH; Bochum; Germany; (5) Department of Physics; West Virginia University; Morgantown; USA)

arXiv:2512.09169·cond-mat.mtrl-sci·May 4, 2026

AI-Driven Expansion and Application of the Alexandria Database

Th\'eo Cavignac (1), Jonathan Schmidt (2), Pierre-Paul De Breuck (1), Antoine Loew (1), Tiago F. T. Cerqueira (3), Hai-Chen Wang (1), Anton Bochkarev (4), Yury Lysogorskiy (4), Aldo H. Romero (5), Ralf Drautz (4), Silvana Botti (1)

PDF

TL;DR

This paper introduces a multi-stage AI workflow that significantly enhances materials discovery, expanding the Alexandria database with millions of structures and stable compounds, and providing tools for further research.

Contribution

The authors develop an integrated AI-driven pipeline combining generative models, machine learning potentials, and neural networks, resulting in a large, validated materials database and improved predictive models.

Findings

01

Achieved 99% success rate in identifying stable compounds within 100 meV/atom.

02

Expanded the Alexandria database to 5.8 million structures with 175,000 stable compounds.

03

Released a large dataset with structures, forces, and stresses for training universal force fields.

Abstract

We present a novel multi-stage workflow for computational materials discovery that achieves a 99% success rate in identifying compounds within 100 meV/atom of thermodynamic stability, with a threefold improvement over previous approaches. By combining the Matra-Genoa generative model, Orb-v2 universal machine learning interatomic potential, and ALIGNN graph neural network for energy prediction, we generated 119 million candidate structures and added 1.3 million DFT-validated compounds to the ALEXANDRIA database, including 74 thousand new stable materials. The expanded ALEXANDRIA database now contains 5.8 million structures with 175 thousand compounds on the convex hull. Predicted structural disorder rates (37-43%) match experimental databases, unlike other recent AI-generated datasets. Analysis reveals fundamental patterns in space group distributions, coordination environments, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.