Interpreting Interpretations: Organizing Attribution Methods by Criteria
Zifan Wang, Piotr Mardziel, Anupam Datta, Matt Fredrikson

TL;DR
This paper introduces a framework for evaluating attribution methods in deep learning by incorporating logical concepts like necessity, sufficiency, and proportionality, enabling more nuanced interpretation of model explanations.
Contribution
It extends existing attribution analysis by defining metrics for necessity, sufficiency, and proportionality, allowing comparison and interpretation of different attribution methods.
Findings
Some attribution methods better capture necessity.
Other methods are more suited for sufficiency.
No single method excels in all interpretability criteria.
Abstract
Motivated by distinct, though related, criteria, a growing number of attribution methods have been developed tointerprete deep learning. While each relies on the interpretability of the concept of "importance" and our ability to visualize patterns, explanations produced by the methods often differ. As a result, input attribution for vision models fail to provide any level of human understanding of model behaviour. In this work we expand the foundationsof human-understandable concepts with which attributionscan be interpreted beyond "importance" and its visualization; we incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality. We definemetrics to represent these concepts as quantitative aspectsof an attribution. This allows us to compare attributionsproduced by different methods and interpret them in novelways: to what extent does this attribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Interpreting Interpretations: Organizing Attribution Methods by Criteria· youtube
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsInterpretability
