Well-Calibrated Probabilistic Predictive Maintenance using Venn-Abers
Ulf Johansson, Tuwe L\"ofstr\"om, and Cecilia S\"onstr\"od

TL;DR
This paper explores the application of Venn-Abers predictors to improve probability calibration in fault detection models, especially for unbalanced datasets, enhancing decision support with valid probability intervals.
Contribution
It demonstrates how Venn-Abers calibration corrects confidence issues in decision trees, random forests, and XGBoost, providing well-calibrated probabilities and confidence intervals for fault detection.
Findings
Venn-Abers improves calibration of fault detection models.
Models produce valid probability intervals indicating confidence.
Calibration benefits are demonstrated across multiple machine learning models.
Abstract
When using machine learning for fault detection, a common problem is the fact that most data sets are very unbalanced, with the minority class (a fault) being the interesting one. In this paper, we investigate the usage of Venn-Abers predictors, looking specifically at the effect on the minority class predictions. A key property of Venn-Abers predictors is that they output well-calibrated probability intervals. In the experiments, we apply Venn-Abers calibration to decision trees, random forests and XGBoost models, showing how both overconfident and underconfident models are corrected. In addition, the benefit of using the valid probability intervals produced by Venn-Abers for decision support is demonstrated. When using techniques producing opaque underlying models, e.g., random forest and XGBoost, each prediction will consist of not only the label, but also a valid probability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
