A Perfectly Truthful Calibration Measure

Jason Hartline; Lunjia Hu; Yifan Wu

arXiv:2508.13100·cs.LG·May 6, 2026

A Perfectly Truthful Calibration Measure

Jason Hartline, Lunjia Hu, Yifan Wu

PDF

TL;DR

This paper introduces a simple, perfectly truthful calibration measure called ATB, which is efficient to compute and improves calibration testing methods, addressing a key challenge in probabilistic prediction evaluation.

Contribution

The authors design the first perfectly truthful calibration measure in the batch setting and provide a general recipe for constructing such measures.

Findings

01

ATB is quadratically related to existing calibration measures smCal and distCal.

02

ATB enables the first linear-time calibration testing algorithm.

03

The paper introduces a general recipe for constructing truthful calibration measures.

Abstract

Calibration requires that predictions are conditionally unbiased and, therefore, reliably interpretable as probabilities. A calibration measure quantifies how far a predictor is from perfect calibration. As introduced by Haghtalab et al. (2024), a calibration measure is truthful if it is minimized in expectation when a predictor outputs the ground-truth probabilities. Predicting the true probabilities guarantees perfect calibration, but in reality, when calibration is evaluated on a random sample, all known calibration measures incentivize predictors to lie in order to appear more calibrated. Such lack of truthfulness motivated Haghtalab et al. (2024) and Qiao and Zhao (2025) to construct approximately truthful calibration measures in the sequential prediction setting, but no perfectly truthful calibration measure was known to exist even in the more basic batch setting. We design a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.