Testable and Actionable Calibration for Full Swap Regret
Konstantina Bairaktari, Lunjia Hu, Huy L. Nguyen, Jonathan Ullman

TL;DR
This paper introduces SCDL, a new calibration measure for AI predictions that is both fully actionable and testable, addressing key limitations of existing measures.
Contribution
The paper proposes SCDL, a calibration measure that is simultaneously actionable and testable, with proven theoretical properties and empirical validation.
Findings
SCDL is fully actionable without weakening calibration requirements.
SCDL can be tested with nearly optimal estimation error.
Experiments show SCDL outperforms existing calibration measures in practice.
Abstract
AI generated predictions increasingly inform decision making in critical tasks, and therefore must be trustworthy. One widely used measure of trustworthiness is calibration, which requires that the predictions match the true frequencies and can be treated like real probabilities of a given outcome. However, defining calibration is subtle, and designing good measures of calibration error has been an active topic of recent research. The first goal is to find calibration measures that are actionable, meaning they can inform decision makers about their utility loss when predictions are treated as true probabilities, which is known as swap regret. The second goal is to find calibration measures that are testable, meaning that calibration error can be measured from a small sample of predictions and outcomes. Although these are very basic requirements, there is no existing calibration measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
