Information-Theoretic Visual Explanation for Black-Box Classifiers
Jihun Yi, Eunji Kim, Siwon Kim, Sungroh Yoon

TL;DR
This paper introduces an information-theoretic approach to explain black-box classifier predictions by generating attribution maps that quantify pixel importance both generally and for specific classes, improving correctness over existing methods.
Contribution
It proposes a novel method using information gain and mutual information to produce more accurate and interpretable attribution maps for black-box classifiers.
Findings
Improved correctness of attribution maps compared to existing methods
Provides both class-independent and class-specific explanations
Analyzed an ImageNet classifier using the proposed approach
Abstract
In this work, we attempt to explain the prediction of any black-box classifier from an information-theoretic perspective. For each input feature, we compare the classifier outputs with and without that feature using two information-theoretic metrics. Accordingly, we obtain two attribution maps--an information gain (IG) map and a point-wise mutual information (PMI) map. IG map provides a class-independent answer to "How informative is each pixel?", and PMI map offers a class-specific explanation of "How much does each pixel support a specific class?" Compared to existing methods, our method improves the correctness of the attribution maps in terms of a quantitative metric. We also provide a detailed analysis of an ImageNet classifier using the proposed method, and the code is available online.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Cell Image Analysis Techniques · Adversarial Robustness in Machine Learning
