Calibration Meets Explanation: A Simple and Effective Approach for Model Confidence Estimates
Dongfang Li, Baotian Hu, Qingcai Chen

TL;DR
This paper introduces CME, a method that uses model explanations to improve confidence calibration in language models, demonstrating consistent performance gains across multiple datasets and settings.
Contribution
The paper presents a novel explanation-based calibration method, CME, that enhances confidence estimates by leveraging feature attributions, a concept not previously explored for this purpose.
Findings
CME improves calibration across six datasets and two language models.
Combining CME with temperature scaling further reduces calibration errors.
Model explanations can effectively aid in confidence calibration.
Abstract
Calibration strengthens the trustworthiness of black-box models by producing better accurate confidence estimates on given examples. However, little is known about if model explanations can help confidence calibration. Intuitively, humans look at important features attributions and decide whether the model is trustworthy. Similarly, the explanations can tell us when the model may or may not know. Inspired by this, we propose a method named CME that leverages model explanations to make the model less confident with non-inductive attributions. The idea is that when the model is not highly confident, it is difficult to identify strong indications of any class, and the tokens accordingly do not have high attribution scores for any class and vice versa. We conduct extensive experiments on six datasets with two popular pre-trained language models in the in-domain and out-of-domain settings.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning in Healthcare
