Performance Estimation in Binary Classification Using Calibrated Confidence

Juhani Kivim\"aki; Jakub Bia{\l}ek; Wojtek Kuberski; Jukka K. Nurminen

arXiv:2505.05295·cs.LG·March 10, 2026

Performance Estimation in Binary Classification Using Calibrated Confidence

Juhani Kivim\"aki, Jakub Bia{\l}ek, Wojtek Kuberski, Jukka K. Nurminen

PDF

TL;DR

This paper introduces CBPE, a novel method for estimating various binary classification metrics like accuracy, precision, recall, and F1 score without needing ground truth labels, using calibrated confidence scores and probabilistic modeling.

Contribution

CBPE is the first method to estimate any binary classification metric from the confusion matrix using calibrated confidence scores, providing theoretical guarantees and confidence intervals.

Findings

01

CBPE accurately estimates metrics without ground truth labels.

02

The method provides valid confidence intervals for the estimates.

03

CBPE outperforms existing label-free performance estimation approaches.

Abstract

Model monitoring is a critical component of the machine learning lifecycle, safeguarding against undetected drops in the model's performance after deployment. Traditionally, performance monitoring has required access to ground truth labels, which are not always readily available. This can result in unacceptable latency or render performance monitoring altogether impossible. Recently, methods designed to estimate the accuracy of classifier models without access to labels have shown promising results. However, there are various other metrics that might be more suitable for assessing model performance in many cases. Until now, none of these important metrics has received similar interest from the scientific community. In this work, we address this gap by presenting CBPE, a novel method that can estimate any binary classification metric defined using the confusion matrix. In particular, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.