Knowing when to trust machine-learned interatomic potentials
Shams Mehdi, Ilkwon Cho, Olexandr Isayev

TL;DR
This paper introduces PROBE, a post-hoc method for reliable uncertainty quantification in machine-learned interatomic potentials, outperforming ensemble-based methods and providing interpretable diagnostics.
Contribution
PROBE recasts uncertainty quantification as selective classification using frozen embeddings, offering a scalable, architecture-agnostic, and interpretable alternative to ensemble methods.
Findings
PROBE's reliability probability correlates strongly with actual prediction error.
PROBE outperforms ensemble disagreement as a binary reliability signal.
Multi-head self-attention provides chemically interpretable importance maps.
Abstract
Prevailing machine-learned interatomic potential (MLIP) uncertainty-quantification methods rely on ensembles of independently trained backbones. These methods scale unfavorably with foundation-scale MLIPs, and their member-disagreement signals correlate weakly with per-molecule prediction error. Here we probe the frozen per-atom representations of a pretrained MLIP with a compact discriminative classifier, recasting MLIP uncertainty quantification as selective classification rather than error regression. The resulting method, PROBE (Post-hoc Reliability frOm Backbone Embeddings), produces a per-prediction reliability probability that monotonically tracks actual error without modification to the underlying model. Across large held-out evaluation sets and two structurally distinct MLIP architectures, PROBE outperforms ensemble disagreement as a binary reliability signal, which strengthens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
