Discovery of Hidden Miscalibration Regimes

Katarzyna Kobalczyk; Mihaela van der Schaar

arXiv:2605.13484·cs.LG·May 14, 2026

Discovery of Hidden Miscalibration Regimes

Katarzyna Kobalczyk, Mihaela van der Schaar

PDF

TL;DR

This paper introduces a method to identify and analyze hidden, input-dependent calibration errors in models, revealing that models often have localized miscalibration that traditional global metrics miss.

Contribution

It proposes a diagnostic framework that learns a calibration-aware input representation to discover and correct local miscalibration regimes without predefined data slices.

Findings

01

Input-dependent calibration heterogeneity is common across LLMs.

02

Discovered miscalibration fields enable effective local confidence correction.

03

The approach improves calibration in systematically miscalibrated regions.

Abstract

Calibration is commonly evaluated by comparing model confidence with its empirical correctness, implicitly treating reliability as a function of the confidence score alone. However, this view can hide substantial structure: models may be systematically overconfident on some kinds of inputs and underconfident on others, causing global reliability diagnostics to obscure localised calibration failures. To address this, we formulate the problem of discovering hidden miscalibration regimes without assuming access to predefined data slices. We define the corresponding miscalibration field and propose a diagnostic framework for estimating it. Our approach learns a calibration-aware representation of the input space and estimates signed local miscalibration by kernel smoothing in the learned geometry. Across four real-world LLM benchmarks and twelve LLMs, we find that input-dependent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.