Monitoring the calibration of probability forecasts with an application to concept drift detection involving image classification

Christopher T. Franck; Anne R. Driscoll; Zoe Szajnfarber; William H. Woodall

arXiv:2510.25573·stat.ML·October 30, 2025

Monitoring the calibration of probability forecasts with an application to concept drift detection involving image classification

Christopher T. Franck, Anne R. Driscoll, Zoe Szajnfarber, William H. Woodall

PDF

TL;DR

This paper introduces a cumulative sum-based method with dynamic limits for monitoring the calibration of probability forecasts in image classification, enabling early detection of model miscalibration and concept drift without needing access to the model internals.

Contribution

It presents a novel, model-agnostic approach for real-time calibration monitoring applicable to various operational settings and concept drift scenarios.

Findings

01

Effective detection of miscalibration in image classifiers

02

Applicable to real-time operational monitoring

03

Operates without access to model internals

Abstract

Machine learning approaches for image classification have led to impressive advances in that field. For example, convolutional neural networks are able to achieve remarkable image classification accuracy across a wide range of applications in industry, defense, and other areas. While these machine learning models boast impressive accuracy, a related concern is how to assess and maintain calibration in the predictions these models make. A classification model is said to be well calibrated if its predicted probabilities correspond with the rates events actually occur. While there are many available methods to assess machine learning calibration and recalibrate faulty predictions, less effort has been spent on developing approaches that continually monitor predictive models for potential loss of calibration as time passes. We propose a cumulative sum-based approach with dynamic limits that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.