Model Accuracy and Data Heterogeneity Shape Uncertainty Quantification in Machine Learning Interatomic Potentials

Fei Shuang; Zixiong Wei; Kai Liu; Wei Gao; Poulumi Dey

arXiv:2508.03405·cond-mat.mtrl-sci·August 6, 2025·Mach. Learn. Sci. Technol.

Model Accuracy and Data Heterogeneity Shape Uncertainty Quantification in Machine Learning Interatomic Potentials

Fei Shuang, Zixiong Wei, Kai Liu, Wei Gao, Poulumi Dey

PDF

TL;DR

This paper explores how model accuracy and data heterogeneity influence uncertainty quantification in machine learning interatomic potentials, proposing a clustering-based method to improve detection of novel atomic environments.

Contribution

It introduces clustering-enhanced local D-optimality, improving uncertainty estimation and novelty detection in heterogeneous datasets for MLIPs.

Findings

01

Higher model accuracy improves uncertainty calibration.

02

D-optimality yields conservative uncertainty estimates.

03

Clustering-enhanced D-optimality improves novelty detection.

Abstract

Machine learning interatomic potentials (MLIPs) enable accurate atomistic modelling, but reliable uncertainty quantification (UQ) remains elusive. In this study, we investigate two UQ strategies, ensemble learning and D-optimality, within the atomic cluster expansion framework. It is revealed that higher model accuracy strengthens the correlation between predicted uncertainties and actual errors and improves novelty detection, with D-optimality yielding more conservative estimates. Both methods deliver well calibrated uncertainties on homogeneous training sets, yet they underpredict errors and exhibit reduced novelty sensitivity on heterogeneous datasets. To address this limitation, we introduce clustering-enhanced local D-optimality, which partitions configuration space into clusters during training and applies D-optimality within each cluster. This approach substantially improves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.