A Benchmark for Interpretability Methods in Deep Neural Networks

Sara Hooker; Dumitru Erhan; Pieter-Jan Kindermans; Been Kim

arXiv:1806.10758·cs.LG·November 6, 2019·379 cites

A Benchmark for Interpretability Methods in Deep Neural Networks

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim

PDF

Open Access 3 Repos

TL;DR

This paper introduces an empirical benchmark to evaluate the accuracy of feature importance methods in deep neural networks, revealing that most popular methods perform no better than random, with only certain ensemble approaches showing improvement.

Contribution

It provides a systematic benchmark for interpretability methods and highlights the effectiveness of specific ensemble techniques like VarGrad and SmoothGrad-Squared.

Findings

01

Most interpretability methods are no better than random.

02

Ensemble methods like VarGrad outperform other approaches.

03

Some ensemble approaches are computationally expensive without added benefit.

Abstract

We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance. Only certain ensemble based approaches---VarGrad and SmoothGrad-Squared---outperform such a random assignment of importance. The manner of ensembling remains critical, we show that some approaches do no better then the underlying method but carry a far higher computational burden.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques

MethodsInterpretability