Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

Oana-Maria Camburu; Eleonora Giunchiglia; Jakob Foerster; Thomas; Lukasiewicz; Phil Blunsom

arXiv:1910.02065·cs.CL·December 6, 2019·36 cites

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas, Lukasiewicz, Phil Blunsom

PDF

Open Access 2 Repos

TL;DR

This paper critically examines post-hoc explanation methods for neural networks, revealing their limitations and proposing a verification framework based on a real-world neural network architecture to assess explanation reliability.

Contribution

It introduces a verification framework for explanation methods from the feature-selection perspective, addressing current validation shortcomings and exposing explainer failure modes.

Findings

01

Current explainers often fail under realistic neural network scenarios.

02

Different explanation perspectives lead to fundamentally different explanations.

03

The proposed framework provides guarantees on explanation validity.

Abstract

For AI systems to garner widespread public acceptance, we must develop methods capable of explaining the decisions of black-box models such as neural networks. In this work, we identify two issues of current explanatory methods. First, we show that two prevalent perspectives on explanations --- feature-additivity and feature-selection --- lead to fundamentally different instance-wise explanations. In the literature, explainers from different perspectives are currently being directly compared, despite their distinct explanation goals. The second issue is that current post-hoc explainers are either validated under simplistic scenarios (on simple models such as linear regression, or on models trained on syntactic datasets), or, when applied to real-world neural networks, explainers are commonly validated under the assumption that the learned models behave reasonably. However, neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare