Evaluating the Ability of Explanations to Disambiguate Models in a Rashomon Set
Kaivalya Rawal, Eoin Delaney, Zihao Fu, Sandra Wachter, Chris Russell

TL;DR
This paper introduces AXE, a new method for evaluating feature-importance explanations in AI models, which effectively detects false explanations and helps select models within a Rashomon set that are both accurate and interpretable.
Contribution
The paper proposes AXE, a novel explanation evaluation method based on three principles, capable of detecting adversarial explanations and addressing limitations of prior evaluation strategies.
Findings
AXE detects false explanations with 100% success rate.
Evaluation metrics based on ground truth can obscure behavioral differences.
AXE can identify when protected attributes influence model predictions.
Abstract
Explainable artificial intelligence (XAI) is concerned with producing explanations indicating the inner workings of models. For a Rashomon set of similarly performing models, explanations provide a way of disambiguating the behavior of individual models, helping select models for deployment. However explanations themselves can vary depending on the explainer used, and need to be evaluated. In the paper "Evaluating Model Explanations without Ground Truth", we proposed three principles of explanation evaluation and a new method "AXE" to evaluate the quality of feature-importance explanations. We go on to illustrate how evaluation metrics that rely on comparing model explanations against ideal ground truth explanations obscure behavioral differences within a Rashomon set. Explanation evaluation aligned with our proposed principles would highlight these differences instead, helping select…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
