From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction
Nima Shoghi, Adeesh Kolluru, John R. Kitchin, Zachary W. Ulissi, C., Lawrence Zitnick, Brandon M. Wood

TL;DR
This paper introduces a multi-domain pre-training approach for atomic property prediction models, significantly improving performance across diverse chemical datasets and setting new state-of-the-art results in many tasks.
Contribution
The authors propose Joint Multi-domain Pre-training (JMP), a supervised multi-task strategy that leverages large, diverse datasets to enhance atomic property prediction models.
Findings
JMP improves performance by 59% over training from scratch.
Matches or exceeds state-of-the-art on 34 out of 40 tasks.
Effective across multiple chemical domains and low-data scenarios.
Abstract
Foundation models have been transformational in machine learning fields such as natural language processing and computer vision. Similar success in atomic property prediction has been limited due to the challenges of training effective models across multiple chemical domains. To address this, we introduce Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that simultaneously trains on multiple datasets from different chemical domains, treating each dataset as a unique pre-training task within a multi-task framework. Our combined training dataset consists of 120M systems from OC20, OC22, ANI-1x, and Transition-1x. We evaluate performance and generalization by fine-tuning over a diverse set of downstream tasks and datasets including: QM9, rMD17, MatBench, QMOF, SPICE, and MD22. JMP demonstrates an average improvement of 59% over training from scratch, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Mass Spectrometry Techniques and Applications
MethodsSparse Evolutionary Training
