Confidence-based Estimators for Predictive Performance in Model Monitoring
Juhani Kivim\"aki, Jakub Bia{\l}ek, Jukka K. Nurminen, Wojtek, Kuberski

TL;DR
This paper investigates confidence-based estimators for monitoring model performance when ground truth labels are delayed or unavailable, revealing that the naive Average Confidence method is theoretically unbiased and often competitive.
Contribution
The paper provides a theoretical analysis of the Average Confidence estimator, demonstrating its unbiasedness and consistency, and empirically compares it with more complex estimators.
Findings
AC estimator is unbiased under certain conditions
AC often outperforms complex estimators in practice
Estimator performance is case-dependent
Abstract
After a machine learning model has been deployed into production, its predictive performance needs to be monitored. Ideally, such monitoring can be carried out by comparing the model's predictions against ground truth labels. For this to be possible, the ground truth labels must be available relatively soon after inference. However, there are many use cases where ground truth labels are available only after a significant delay, or in the worst case, not at all. In such cases, directly monitoring the model's predictive performance is impossible. Recently, novel methods for estimating the predictive performance of a model when ground truth is unavailable have been developed. Many of these methods leverage model confidence or other uncertainty estimates and are experimentally compared against a naive baseline method, namely Average Confidence (AC), which estimates model accuracy as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
MethodsSparse Evolutionary Training
