New Metric Formulas that Include Measurement Errors in Machine Learning for Natural Sciences
Umberto Michelucci, Francesca Venturini

TL;DR
This paper introduces new formulas for evaluating machine learning models that explicitly incorporate measurement errors in data, providing more realistic performance estimates especially in physics and sciences.
Contribution
It derives general, model-independent formulas for common metrics that account for measurement errors, improving the reliability of model evaluation in scientific data analysis.
Findings
Formulas provide more pessimistic, realistic metric estimates.
Applicable to both regression and classification problems.
Valid for any measurement error type and data model.
Abstract
The application of machine learning to physics problems is widely found in the scientific literature. Both regression and classification problems are addressed by a large array of techniques that involve learning algorithms. Unfortunately, the measurement errors of the data used to train machine learning models are almost always neglected. This leads to estimations of the performance of the models (and thus their generalisation power) that is too optimistic since it is always assumed that the target variables (what one wants to predict) are correct. In physics, this is a dramatic deficiency as it can lead to the belief that theories or patterns exist where, in reality, they do not. This paper addresses this deficiency by deriving formulas for commonly used metrics (both for regression and classification problems) that take into account measurement errors of target variables. The new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Neural Networks and Applications
