Cutting Through the Noise: On-the-fly Outlier Detection for Robust Training of Machine Learning Interatomic Potentials
Terry C. W. Lam, Niamh O'Neill, Christoph Schran, Lars L. Schaaf

TL;DR
This paper presents an unsupervised, on-the-fly outlier detection method that automatically down-weights noisy data during training of machine learning interatomic potentials, improving robustness and accuracy without extra reference calculations.
Contribution
The authors introduce a novel, scalable outlier detection scheme that operates during training, reducing the need for manual filtering or multiple retraining cycles.
Findings
Prevents overfitting in ML interatomic potentials.
Achieves comparable performance to iterative refinement methods.
Reduces energy errors by a factor of three on a large dataset.
Abstract
The accuracy of machine learning interatomic potentials suffers from reference data that contains numerical noise. Often originating from unconverged or inconsistent electronic-structure calculations, this noise is challenging to identify. Existing mitigation strategies such as manual filtering or iterative refinement of outliers, require either substantial expert effort or multiple expensive retraining cycles, making them difficult to scale to large datasets. Here, we introduce an on-the-fly outlier detection scheme that automatically down-weights noisy samples, without requiring additional reference calculations. By tracking the loss distribution via an exponential moving average, this unsupervised method identifies outliers throughout a single training run. We show that this approach prevents overfitting and matches the performance of iterative refinement baselines with significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Chemical Physics Studies
