Do Explanations Explain? Model Knows Best
Ashkan Khakzar, Pedram Khorsandi, Rozhin Nobahari, Nassir Navab

TL;DR
This paper introduces an empirical framework that uses neural networks to evaluate the reliability of explanation methods by testing if they conform to specific axioms through controlled experiments.
Contribution
It proposes a novel, model-based evaluation framework for explanation methods, enabling systematic and axiomatic assessment of their trustworthiness.
Findings
Different explanation methods often highlight different features.
The framework can reveal properties and limitations of explanation methods.
It provides a toolset for evaluating current and future explanation techniques.
Abstract
It is a mystery which input features contribute to a neural network's output. Various explanation (feature attribution) methods are proposed in the literature to shed light on the problem. One peculiar observation is that these explanations (attributions) point to different features as being important. The phenomenon raises the question, which explanation to trust? We propose a framework for evaluating the explanations using the neural network model itself. The framework leverages the network to generate input features that impose a particular behavior on the output. Using the generated features, we devise controlled experimental setups to evaluate whether an explanation method conforms to an axiom. Thus we propose an empirical framework for axiomatic evaluation of explanation methods. We evaluate well-known and promising explanation solutions using the proposed framework. The framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare
