Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution
Huiqi Deng, Na Zou, Weifu Chen, Guocan Feng, Mengnan Du, Xia Hu

TL;DR
This paper introduces MIP-IN, a mutual information-based back-propagation method that provides faithful, quantitative interpretations of neural networks by learning to invert the network's computations.
Contribution
It proposes a novel mutual information preserving framework and a recursive inverse network to produce more faithful and quantitative neural network interpretations.
Findings
MIP-IN preserves mutual information between input and output.
The inverted source signals satisfy completeness and minimality.
Interpretations are empirically validated as effective and faithful.
Abstract
Back propagation based visualizations have been proposed to interpret deep neural networks (DNNs), some of which produce interpretations with good visual quality. However, there exist doubts about whether these intuitive visualizations are related to the network decisions. Recent studies have confirmed this suspicion by verifying that almost all these modified back-propagation visualizations are not faithful to the model's decision-making process. Besides, these visualizations produce vague "relative importance scores", among which low values can't guarantee to be independent of the final prediction. Hence, it's highly desirable to develop a novel back-propagation framework that guarantees theoretical faithfulness and produces a quantitative attribution score with a clear understanding. To achieve the goal, we resort to mutual information theory to generate the interpretations, studying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Cell Image Analysis Techniques · Adversarial Robustness in Machine Learning
