Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods
Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas, Lukasiewicz, Phil Blunsom

TL;DR
This paper critically examines post-hoc explanation methods for neural networks, revealing their limitations and proposing a verification framework based on a real-world neural network architecture to assess explanation reliability.
Contribution
It introduces a verification framework for explanation methods from the feature-selection perspective, addressing current validation shortcomings and exposing explainer failure modes.
Findings
Current explainers often fail under realistic neural network scenarios.
Different explanation perspectives lead to fundamentally different explanations.
The proposed framework provides guarantees on explanation validity.
Abstract
For AI systems to garner widespread public acceptance, we must develop methods capable of explaining the decisions of black-box models such as neural networks. In this work, we identify two issues of current explanatory methods. First, we show that two prevalent perspectives on explanations --- feature-additivity and feature-selection --- lead to fundamentally different instance-wise explanations. In the literature, explainers from different perspectives are currently being directly compared, despite their distinct explanation goals. The second issue is that current post-hoc explainers are either validated under simplistic scenarios (on simple models such as linear regression, or on models trained on syntactic datasets), or, when applied to real-world neural networks, explainers are commonly validated under the assumption that the learned models behave reasonably. However, neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare
