On the Robustness of Interpretability Methods

David Alvarez-Melis; Tommi S. Jaakkola

arXiv:1806.08049·cs.LG·June 22, 2018·76 cites

On the Robustness of Interpretability Methods

David Alvarez-Melis, Tommi S. Jaakkola

PDF

Open Access 3 Repos

TL;DR

This paper emphasizes the importance of robustness in interpretability methods, introduces metrics to measure it, reveals current methods' shortcomings, and suggests ways to improve robustness in explanations.

Contribution

The paper introduces new metrics for robustness, evaluates existing interpretability methods, and proposes techniques to enhance their robustness.

Findings

01

Current interpretability methods lack robustness according to proposed metrics.

02

Robustness of explanations can be quantitatively assessed with new metrics.

03

Proposed methods can improve the robustness of existing interpretability approaches.

Abstract

We argue that robustness of explanations---i.e., that similar inputs should give rise to similar explanations---is a key desideratum for interpretability. We introduce metrics to quantify robustness and demonstrate that current methods do not perform well according to these metrics. Finally, we propose ways that robustness can be enforced on existing interpretability approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Fault Detection and Control Systems

MethodsInterpretability