Practical estimation of the optimal classification error with soft labels and calibration

Ryota Ushio; Takashi Ishida; Masashi Sugiyama

arXiv:2505.20761·cs.LG·May 13, 2026

Practical estimation of the optimal classification error with soft labels and calibration

Ryota Ushio, Takashi Ishida, Masashi Sugiyama

PDF

1 Repo 1 Video

TL;DR

This paper introduces a practical, theoretically supported method for estimating the optimal binary classification error using soft labels, addressing bias decay, calibration issues, and privacy concerns.

Contribution

It extends previous work by analyzing bias properties, handling corrupted soft labels, and proposing a calibration method that is robust under weaker assumptions.

Findings

01

Bias decay rate depends on class separation.

02

Calibrated soft labels alone are insufficient for accurate estimation.

03

Isotonic calibration yields a consistent estimator under weaker assumptions.

Abstract

While the performance of machine learning systems has experienced significant improvement in recent years, relatively little attention has been paid to the fundamental question: to what extent can we improve our models? This paper provides a means of answering this question in the setting of binary classification, which is practical and theoretically supported. We extend a previous work that utilizes soft labels for estimating the Bayes error, the optimal error rate, in two important ways. First, we theoretically investigate the properties of the bias of the hard-label-based estimator discussed in the original work. We reveal that the decay rate of the bias is adaptive to how well the two class-conditional distributions are separated, and it can decay significantly faster than the previous result suggested as the number of hard labels per instance grows. Second, we tackle a more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RyotaUshio/bayes-error-estimation
github

Videos

Practical estimation of the optimal classification error with soft labels and calibration· slideslive