Axiomatic Attribution for Deep Networks

Mukund Sundararajan; Ankur Taly; Qiqi Yan

arXiv:1703.01365·cs.LG·June 14, 2017·2.6k cites

Axiomatic Attribution for Deep Networks

Mukund Sundararajan, Ankur Taly, Qiqi Yan

PDF

Open Access 5 Repos 1 Models 1 Video

TL;DR

This paper introduces Integrated Gradients, a new attribution method for deep networks that satisfies key axioms, providing a simple, effective way to interpret models across various domains.

Contribution

The paper identifies fundamental axioms for attribution methods and develops Integrated Gradients, a novel, axiomatic, and easy-to-implement attribution technique for deep networks.

Findings

01

Integrated Gradients satisfies Sensitivity and Implementation Invariance.

02

The method effectively debugged and interpreted models in images, text, and chemistry.

03

It enables better user engagement with neural network models.

Abstract

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
cffl/bert-base-styleclassification-subjective-neutral
model· 373 dl· ♡ 9
373 dl♡ 9

Videos

[Quiz] Interpretable ML, VQ-VAE w/o Quantization / infinite codebook, Pearson’s, PointClouds· youtube

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning