AI-Driven Expansion and Application of the Alexandria Database
Th\'eo Cavignac (1), Jonathan Schmidt (2), Pierre-Paul De Breuck (1), Antoine Loew (1), Tiago F. T. Cerqueira (3), Hai-Chen Wang (1), Anton Bochkarev (4), Yury Lysogorskiy (4), Aldo H. Romero (5), Ralf Drautz (4), Silvana Botti (1)

TL;DR
This paper introduces a multi-stage AI workflow that significantly enhances materials discovery, expanding the Alexandria database with millions of structures and stable compounds, and providing tools for further research.
Contribution
The authors develop an integrated AI-driven pipeline combining generative models, machine learning potentials, and neural networks, resulting in a large, validated materials database and improved predictive models.
Findings
Achieved 99% success rate in identifying stable compounds within 100 meV/atom.
Expanded the Alexandria database to 5.8 million structures with 175,000 stable compounds.
Released a large dataset with structures, forces, and stresses for training universal force fields.
Abstract
We present a novel multi-stage workflow for computational materials discovery that achieves a 99% success rate in identifying compounds within 100 meV/atom of thermodynamic stability, with a threefold improvement over previous approaches. By combining the Matra-Genoa generative model, Orb-v2 universal machine learning interatomic potential, and ALIGNN graph neural network for energy prediction, we generated 119 million candidate structures and added 1.3 million DFT-validated compounds to the ALEXANDRIA database, including 74 thousand new stable materials. The expanded ALEXANDRIA database now contains 5.8 million structures with 175 thousand compounds on the convex hull. Predicted structural disorder rates (37-43%) match experimental databases, unlike other recent AI-generated datasets. Analysis reveals fundamental patterns in space group distributions, coordination environments, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
