Getting a CLUE: A Method for Explaining Uncertainty Estimates
Javier Antor\'an, Umang Bhatt, Tameem Adel, Adrian Weller, Jos\'e, Miguel Hern\'andez-Lobato

TL;DR
This paper introduces CLUE, a novel method for interpreting uncertainty estimates in probabilistic models like Bayesian Neural Networks, helping users understand which input changes affect model confidence.
Contribution
The paper proposes CLUE, a new approach for explaining uncertainty in differentiable probabilistic models, with a validation framework, ablation studies, and user evaluation.
Findings
CLUE outperforms baseline methods in explaining uncertainty.
It helps identify input patterns responsible for predictive uncertainty.
The method is validated through a comprehensive evaluation framework.
Abstract
Both uncertainty estimation and interpretability are important factors for trustworthy machine learning systems. However, there is little work at the intersection of these two areas. We address this gap by proposing a novel method for interpreting uncertainty estimates from differentiable probabilistic models, like Bayesian Neural Networks (BNNs). Our method, Counterfactual Latent Uncertainty Explanations (CLUE), indicates how to change an input, while keeping it on the data manifold, such that a BNN becomes more confident about the input's prediction. We validate CLUE through 1) a novel framework for evaluating counterfactual explanations of uncertainty, 2) a series of ablation experiments, and 3) a user study. Our experiments show that CLUE outperforms baselines and enables practitioners to better understand which input patterns are responsible for predictive uncertainty.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Scientific Computing and Data Management · Adversarial Robustness in Machine Learning
MethodsInterpretability
