Benchmarking Attribution Methods with Relative Feature Importance

Mengjiao Yang; Been Kim

arXiv:1907.09701·cs.LG·November 6, 2019·86 cites

Benchmarking Attribution Methods with Relative Feature Importance

Mengjiao Yang, Been Kim

PDF

Open Access 2 Repos

TL;DR

This paper introduces BAM, a framework for quantitatively evaluating feature attribution methods using datasets with known feature importance, revealing that some methods produce false positive explanations.

Contribution

The paper presents a novel benchmarking framework with datasets, models, and metrics for evaluating attribution methods against known feature importance.

Findings

01

Certain attribution methods are prone to false positives

02

The framework enables quantitative comparison of attribution methods

03

Open source resources facilitate future research

Abstract

Interpretability is an important area of research for safe deployment of machine learning systems. One particular type of interpretability method attributes model decisions to input features. Despite active development, quantitative evaluation of feature attribution methods remains difficult due to the lack of ground truth: we do not know which input features are in fact important to a model. In this work, we propose a framework for Benchmarking Attribution Methods (BAM) with a priori knowledge of relative feature importance. BAM includes 1) a carefully crafted dataset and models trained with known relative feature importance and 2) three complementary metrics to quantitatively evaluate attribution methods by comparing feature attributions between pairs of models and pairs of inputs. Our evaluation on several widely-used attribution methods suggests that certain methods are more likely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsBottleneck Attention Module · Interpretability