AQCat25: Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis

Omar Allam; Brook Wander; SungYeon Kim; Rudi Plesch; Tyler Sours; Jia-Min Chu; Thomas Ludwig; Jiyoon Kim; Rodrigo Wang; Shivang Agarwal; Alan Rask; Alexandre Fleury; Chuhong Wang; Andrew Wildman; Thomas Mustard; Kevin Ryczko; Paul Abruzzo; AJ Nish; Aayush R. Singh

arXiv:2510.22938·cond-mat.mtrl-sci·December 16, 2025

AQCat25: Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis

Omar Allam, Brook Wander, SungYeon Kim, Rudi Plesch, Tyler Sours, Jia-Min Chu, Thomas Ludwig, Jiyoon Kim, Rodrigo Wang, Shivang Agarwal, Alan Rask, Alexandre Fleury, Chuhong Wang, Andrew Wildman, Thomas Mustard, Kevin Ryczko, Paul Abruzzo, AJ Nish, Aayush R. Singh

PDF

1 Models 1 Datasets

TL;DR

This paper introduces AQCat25, a large DFT dataset for spin-aware catalysis modeling, and demonstrates methods to integrate it with existing data to improve ML potentials without losing generality.

Contribution

We present AQCat25, a new dataset for spin-polarized systems, and develop joint training and conditioning strategies to enhance ML models' accuracy and generalizability in catalysis.

Findings

01

Joint training improves accuracy without catastrophic forgetting.

02

Explicit conditioning with metadata enhances model performance.

03

AQCat25 enables better treatment of spin-polarized systems.

Abstract

Large-scale datasets have enabled highly accurate machine learning interatomic potentials (MLIPs) for general-purpose heterogeneous catalysis modeling. There are, however, some limitations in what can be treated with these potentials because of gaps in the underlying training data. To extend these capabilities, we introduce AQCat25, a complementary dataset of 13.5 million density functional theory (DFT) single point calculations designed to improve the treatment of systems where spin polarization and/or higher fidelity are critical. We also investigate methodologies for integrating new datasets, such as AQCat25, with the broader Open Catalyst 2020 (OC20) dataset to create spin-aware models without sacrificing generalizability. We find that directly tuning a general model on AQCat25 leads to catastrophic forgetting of the original dataset's knowledge. Conversely, joint training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
SandboxAQ/aqcat25-ev2
model· ♡ 11
♡ 11

Datasets

SandboxAQ/aqcat25-dataset
dataset· 66 dl
66 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.