On the good reliability of an interval-based metric to validate prediction uncertainty for machine learning regression tasks
Pascal Pernot

TL;DR
This paper proposes an interval-based metric, PICP, for more reliable validation of prediction uncertainty in machine learning regression, especially in heavy-tailed distributions, outperforming variance-based metrics in speed and reliability.
Contribution
It introduces the PICP metric as a more robust alternative to variance-based calibration metrics for prediction uncertainty validation.
Findings
Student's-t distribution models z-scores well
Simple 2-sigma rule estimates 95% intervals for ν>3
PICP tests more datasets than ZMS
Abstract
This short study presents an opportunistic approach to a (more) reliable validation method for prediction uncertainty average calibration. Considering that variance-based calibration metrics (ZMS, NLL, RCE...) are quite sensitive to the presence of heavy tails in the uncertainty and error distributions, a shift is proposed to an interval-based metric, the Prediction Interval Coverage Probability (PICP). It is shown on a large ensemble of molecular properties datasets that (1) sets of z-scores are well represented by Student's- distributions, being the number of degrees of freedom; (2) accurate estimation of 95 prediction intervals can be obtained by the simple rule for ; and (3) the resulting PICPs are more quickly and reliably tested than variance-based calibration metrics. Overall, this method enables to test 20 more datasets than ZMS testing.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems
