Benchmarking Compositional Generalisation for Machine Learning Interatomic Potentials
Amir Masoud Nourollah, Irtaza Khalid, Stefano Leoni, Steven Schockaert

TL;DR
This paper introduces a benchmark to evaluate how well machine learning interatomic potentials generalize to unseen molecules, revealing current models' limitations in compositional generalization.
Contribution
It proposes four tasks designed to test compositional generalization in ML interatomic potentials and provides an empirical analysis showing their high difficulty for existing models.
Findings
State-of-the-art models perform poorly on out-of-distribution molecules.
Errors on unseen molecules are often ten times higher than on training molecules.
Pre-trained foundation models still struggle with compositional generalization.
Abstract
Machine Learning Interatomic Potentials play a fundamental role in computational chemistry and materials science, enabling applications from molecular dynamics simulations to drug design and materials discovery. While recent approaches can estimate inter-atomic forces with high precision, it remains unclear to what extent they can generalise to previously unseen molecules. Do they learn the compositional structure of chemistry, capturing how molecular fragments and their combinations determine properties, or do they primarily learn to interpolate patterns that are specific to the training examples? To address this question, we propose a benchmark consisting of four tasks that require some form of compositional generalisation. In each task, models are tested on molecules that were unseen during training, but the training data is chosen such that generalisation to the test examples should…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
