Confidence-based Estimators for Predictive Performance in Model   Monitoring

Juhani Kivim\"aki; Jakub Bia{\l}ek; Jukka K. Nurminen; Wojtek; Kuberski

arXiv:2407.08649·cs.LG·February 13, 2025

Confidence-based Estimators for Predictive Performance in Model Monitoring

Juhani Kivim\"aki, Jakub Bia{\l}ek, Jukka K. Nurminen, Wojtek, Kuberski

PDF

Open Access

TL;DR

This paper investigates confidence-based estimators for monitoring model performance when ground truth labels are delayed or unavailable, revealing that the naive Average Confidence method is theoretically unbiased and often competitive.

Contribution

The paper provides a theoretical analysis of the Average Confidence estimator, demonstrating its unbiasedness and consistency, and empirically compares it with more complex estimators.

Findings

01

AC estimator is unbiased under certain conditions

02

AC often outperforms complex estimators in practice

03

Estimator performance is case-dependent

Abstract

After a machine learning model has been deployed into production, its predictive performance needs to be monitored. Ideally, such monitoring can be carried out by comparing the model's predictions against ground truth labels. For this to be possible, the ground truth labels must be available relatively soon after inference. However, there are many use cases where ground truth labels are available only after a significant delay, or in the worst case, not at all. In such cases, directly monitoring the model's predictive performance is impossible. Recently, novel methods for estimating the predictive performance of a model when ground truth is unavailable have been developed. Many of these methods leverage model confidence or other uncertainty estimates and are experimentally compared against a naive baseline method, namely Average Confidence (AC), which estimates model accuracy as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems

MethodsSparse Evolutionary Training