Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju

TL;DR
This paper introduces a Bayesian framework for local explanations that provides uncertainty estimates, improving the reliability, consistency, and efficiency of explanation methods like LIME and KernelSHAP in high-stakes settings.
Contribution
It develops Bayesian versions of LIME and KernelSHAP that output credible intervals, addressing instability and hyper-parameter tuning issues in existing methods.
Findings
Explanations include credible intervals for feature importance.
The framework improves stability and consistency of explanations.
Theoretical analysis enhances sampling efficiency.
Abstract
As black box explanations are increasingly being employed to establish model credibility in high-stakes settings, it is important to ensure that these explanations are accurate and reliable. However, prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability. In addition, these methods are also computationally inefficient, and require significant hyper-parameter tuning. In this paper, we address the aforementioned challenges by developing a novel Bayesian framework for generating local explanations along with their associated uncertainty. We instantiate this framework to obtain Bayesian versions of LIME and KernelSHAP which output credible intervals for the feature importances, capturing the associated uncertainty. The resulting explanations not only enable us to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
MethodsLocal Interpretable Model-Agnostic Explanations
