Not Just a Black Box: Learning Important Features Through Propagating Activation Differences
Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, Anshul Kundaje

TL;DR
This paper introduces DeepLIFT, a method for interpreting neural networks by comparing neuron activations to reference states, providing more accurate importance scores than gradient-based approaches.
Contribution
DeepLIFT offers an efficient way to compute feature importance in neural networks by propagating activation differences, improving interpretability over existing gradient-based methods.
Findings
DeepLIFT outperforms gradient-based methods in importance scoring.
Applied to image and genomic models, demonstrating broad applicability.
Provides more stable and meaningful importance scores.
Abstract
Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
MethodsInterpretability
