Taming Multi-Domain, -Fidelity Data: Towards Foundation Models for Atomistic Scale Simulations
Tomoya Shiota, Kenji Ishihara, Tuan Minh Do, Toshio Mori, Wataru Mizukami

TL;DR
This paper introduces Total Energy Alignment (TEA), a novel method for integrating diverse quantum chemical datasets to develop a universal machine learning interatomic potential that performs well across molecular and crystalline systems.
Contribution
The paper presents TEA, enabling seamless integration of heterogeneous datasets, and introduces MACE-Osaka24, the first open-source universal MLIP model for diverse chemical systems.
Findings
MACE-Osaka24 achieves high accuracy across molecular and crystalline systems.
TEA allows integration of datasets without redundant calculations.
The universal model outperforms specialized models in predicting reaction barriers.
Abstract
Machine learning interatomic potentials (MLIPs) are changing atomistic simulations in the field of chemistry and materials science. However, constructing a single universal MLIP that can accurately model molecular and crystalline systems remains challenging. A central obstacle is the integration of diverse datasets generated under different computational conditions. We present Total Energy Alignment (TEA), which is an approach that enables the seamless integration of heterogeneous quantum chemical datasets without redundant calculations. Using TEA, we trained MACE-Osaka24, the first open-source MLIP model based on a unified dataset covering molecular and crystalline systems. This universal model displays strong performances across diverse chemical systems, exhibiting similar or improved accuracies in predicting organic reaction barriers compared to those of specialized models, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Scientific Computing and Data Management · Gas Dynamics and Kinetic Theory
