When Models Know When They Do Not Know: Calibration, Cascading, and Cleaning
Chenjie Hao, Weyl Lu, Yuko Ishiwaka, Zengyi Li, Weier Wan, Yubei Chen

TL;DR
This paper introduces a universal, training-free calibration method for models to recognize their ignorance, enabling cascading and data cleaning techniques that improve efficiency and reliability across vision and language tasks.
Contribution
It proposes a simple calibration approach applicable to both vision and language models, facilitating model cascading and data cleaning without additional training.
Findings
Calibrated confidence correlates with accuracy within models.
Calibrated models remain reliable on test sets.
Cascading models improves efficiency and performance.
Abstract
When a model knows when it does not know, many possibilities emerge. The first question is how to enable a model to recognize that it does not know. A promising approach is to use confidence, computed from the model's internal signals, to reflect its ignorance. Prior work in specific domains has shown that calibration can provide reliable confidence estimates. In this work, we propose a simple, effective, and universal training-free method that applies to both vision and language models, performing model calibration, cascading, and data cleaning to better exploit a model's ability to recognize when it does not know. We first highlight two key empirical observations: higher confidence corresponds to higher accuracy within a single model, and models calibrated on the validation set remain calibrated on a held-out test set. These findings empirically establish the reliability and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
