TL;DR
This paper explores the relationship between model calibration and interpretability in machine learning, demonstrating how calibration affects interpretation and proposing a simple method to enhance interpretability by calibrating models.
Contribution
It establishes a link between calibration and interpretability, showing how calibration influences interpretation outcomes and introducing a practical calibration approach to improve interpretability.
Findings
Interpretations are sensitive to model calibration.
Calibrating models can improve the reliability of interpretations.
Calibration impacts the confidence scores used in interpretability methods.
Abstract
Trustworthy machine learning is driving a large number of ML community works in order to improve ML acceptance and adoption. The main aspect of trustworthy machine learning are the followings: fairness, uncertainty, robustness, explainability and formal guaranties. Each of these individual domains gains the ML community interest, visible by the number of related publications. However few works tackle the interconnection between these fields. In this paper we show a first link between uncertainty and explainability, by studying the relation between calibration and interpretation. As the calibration of a given model changes the way it scores samples, and interpretation approaches often rely on these scores, it seems safe to assume that the confidence-calibration of a model interacts with our ability to interpret such model. In this paper, we show, in the context of networks trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
