Explaining Representation by Mutual Information
Lifeng Gu

TL;DR
This paper introduces a mutual information-based method to fully interpret neural network representations by decomposing them into comprehensive components, providing a theoretically grounded and integrable interpretability framework.
Contribution
It presents a novel MI-based framework that decomposes representations into total, decision-related, and redundant information, offering a complete interpretability approach surpassing existing methods.
Findings
Effective visualization of representation components on CNNs and Transformers
Outperforms partial explanation methods like Grad-CAM in capturing input-representation relationships
Applicable to image classification and few-shot learning tasks
Abstract
As interpretability gains attention in machine learning, there is a growing need for reliable models that fully explain representation content. We propose a mutual information (MI)-based method that decomposes neural network representations into three exhaustive components: total mutual information, decision-related information, and redundant information. This theoretically complete framework captures the entire input-representation relationship, surpassing partial explanations like those from Grad-CAM. Using two lightweight modules integrated into architectures such as CNNs and Transformers,we estimate these components and demonstrate their interpretive power through visualizations on ResNet and prototype network applied to image classification and few-shot learning tasks. Our approach is distinguished by three key features: 1. Rooted in mutual information theory, it delivers a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
