InfoDisent: Explainability of Image Classification Models by Information   Disentanglement

{\L}ukasz Struski; Dawid Rymarczyk; Jacek Tabor

arXiv:2409.10329·cs.CV·March 7, 2025

InfoDisent: Explainability of Image Classification Models by Information Disentanglement

{\L}ukasz Struski, Dawid Rymarczyk, Jacek Tabor

PDF

Open Access

TL;DR

InfoDisent is a hybrid explainability method that disentangles information in image classification models into interpretable concepts, combining post-hoc and self-explainable approaches, and demonstrating effectiveness across multiple datasets and architectures.

Contribution

It introduces InfoDisent, a novel information bottleneck-based method that generalizes concept-level explanations to diverse models and datasets, including ImageNet.

Findings

01

Effective disentanglement of concepts in pretrained models.

02

Successful application to ViTs and convolutional networks.

03

Generalization to large-scale datasets like ImageNet.

Abstract

In this work, we introduce InfoDisent, a hybrid approach to explainability based on the information bottleneck principle. InfoDisent enables the disentanglement of information in the final layer of any pretrained model into atomic concepts, which can be interpreted as prototypical parts. This approach merges the flexibility of post-hoc methods with the concept-level modeling capabilities of self-explainable neural networks, such as ProtoPNets. We demonstrate the effectiveness of InfoDisent through computational experiments and user studies across various datasets using modern backbones such as ViTs and convolutional networks. Notably, InfoDisent generalizes the prototypical parts approach to novel domains (ImageNet).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)