InfoDisent: Explainability of Image Classification Models by Information Disentanglement
{\L}ukasz Struski, Dawid Rymarczyk, Jacek Tabor

TL;DR
InfoDisent is a hybrid explainability method that disentangles information in image classification models into interpretable concepts, combining post-hoc and self-explainable approaches, and demonstrating effectiveness across multiple datasets and architectures.
Contribution
It introduces InfoDisent, a novel information bottleneck-based method that generalizes concept-level explanations to diverse models and datasets, including ImageNet.
Findings
Effective disentanglement of concepts in pretrained models.
Successful application to ViTs and convolutional networks.
Generalization to large-scale datasets like ImageNet.
Abstract
In this work, we introduce InfoDisent, a hybrid approach to explainability based on the information bottleneck principle. InfoDisent enables the disentanglement of information in the final layer of any pretrained model into atomic concepts, which can be interpreted as prototypical parts. This approach merges the flexibility of post-hoc methods with the concept-level modeling capabilities of self-explainable neural networks, such as ProtoPNets. We demonstrate the effectiveness of InfoDisent through computational experiments and user studies across various datasets using modern backbones such as ViTs and convolutional networks. Notably, InfoDisent generalizes the prototypical parts approach to novel domains (ImageNet).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
