Calibration without Ground Truth
Yuqing Kong, Mingyu Song, Yizhou Wang, Yifan Wu

TL;DR
This paper introduces a label-free post-processing method that enhances model calibration and performance by leveraging a weaker reference model, without requiring ground-truth labels, and guarantees worst-case loss reduction.
Contribution
It presents a novel calibration framework that guarantees performance improvement without labels, based on a characterization of model calibration relationships and an efficient Bregman projection algorithm.
Findings
Significantly reduces calibration errors and proper losses in large language models.
Achieves performance comparable to supervised methods without using labels.
Provides theoretical guarantees for worst-case loss reduction.
Abstract
Villalobos et al. [2024] predict that publicly available human text will be exhausted within the next decade. Thus, improving models without access to ground-truth labels becomes increasingly important. We propose a label-free post-processing framework that improves a strong but miscalibrated model using a weaker yet better-calibrated reference. Our framework guarantees a strict performance improvement under any proper loss. Our approach is based on a characterization of when strict improvement is possible: when the strong and reference models are not mutually calibrated. We formalize this condition, connect it to arbitrage and no-trade results from economics, and develop an efficient Bregman projection algorithm that guarantees worst-case loss reduction without labels. Experiments on representative LLMs across varying scales demonstrate that our label-free method significantly reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Text and Document Classification Technologies
