Interpreting Interpretations: Organizing Attribution Methods by Criteria

Zifan Wang; Piotr Mardziel; Anupam Datta; Matt Fredrikson

arXiv:2002.07985·cs.AI·April 7, 2020·5 cites

Interpreting Interpretations: Organizing Attribution Methods by Criteria

Zifan Wang, Piotr Mardziel, Anupam Datta, Matt Fredrikson

PDF

Open Access 1 Video

TL;DR

This paper introduces a framework for evaluating attribution methods in deep learning by incorporating logical concepts like necessity, sufficiency, and proportionality, enabling more nuanced interpretation of model explanations.

Contribution

It extends existing attribution analysis by defining metrics for necessity, sufficiency, and proportionality, allowing comparison and interpretation of different attribution methods.

Findings

01

Some attribution methods better capture necessity.

02

Other methods are more suited for sufficiency.

03

No single method excels in all interpretability criteria.

Abstract

Motivated by distinct, though related, criteria, a growing number of attribution methods have been developed tointerprete deep learning. While each relies on the interpretability of the concept of "importance" and our ability to visualize patterns, explanations produced by the methods often differ. As a result, input attribution for vision models fail to provide any level of human understanding of model behaviour. In this work we expand the foundationsof human-understandable concepts with which attributionscan be interpreted beyond "importance" and its visualization; we incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality. We definemetrics to represent these concepts as quantitative aspectsof an attribution. This allows us to compare attributionsproduced by different methods and interpret them in novelways: to what extent does this attribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Interpreting Interpretations: Organizing Attribution Methods by Criteria· youtube

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsInterpretability