TL;DR
This paper introduces AXE, a novel framework for evaluating local feature importance explanations without needing ground-truth explanations, addressing limitations of existing methods and enabling fair comparison of explanations.
Contribution
The paper proposes AXE, a ground-truth agnostic evaluation framework for explanations that does not rely on model sensitivity or ideal explanations, advancing explanation assessment methods.
Findings
AXE provides an independent measure of explanation quality.
AXE can detect explanation fairwashing.
AXE correlates well with baseline methods.
Abstract
There can be many competing and contradictory explanations for a single model prediction, making it difficult to select which one to use. Current explanation evaluation frameworks measure quality by comparing against ideal "ground-truth" explanations, or by verifying model sensitivity to important inputs. We outline the limitations of these approaches, and propose three desirable principles to ground the future development of explanation evaluation strategies for local feature importance explanations. We propose a ground-truth Agnostic eXplanation Evaluation framework (AXE) for evaluating and comparing model explanations that satisfies these principles. Unlike prior approaches, AXE does not require access to ideal ground-truth explanations for comparison, or rely on model sensitivity - providing an independent measure of explanation quality. We verify AXE by comparing with baselines,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
