Concepts from Representations: Post-hoc Concept Bottleneck Models via Sparse Decomposition of Visual Representations
Shizhan Gong, Xiaofan Zhang, Qi Dou

TL;DR
This paper presents PCBM-ReD, a novel method that retrofits interpretability onto pretrained models by decomposing visual representations into human-understandable concepts, improving accuracy and interpretability in image classification.
Contribution
Introduces PCBM-ReD, a new pipeline that automatically extracts and filters visual concepts from pretrained models using multimodal language models and representation decomposition.
Findings
Achieves state-of-the-art accuracy across 11 image classification tasks.
Narrowed the performance gap with end-to-end models.
Demonstrates improved interpretability of model reasoning.
Abstract
Deep learning has achieved remarkable success in image recognition, yet their inherent opacity poses challenges for deployment in critical domains. Concept-based interpretations aim to address this by explaining model reasoning through human-understandable concepts. However, existing post-hoc methods and ante-hoc concept bottleneck models (CBMs), suffer from limitations such as unreliable concept relevance, non-visual or labor-intensive concept definitions, and model or data-agnostic assumptions. This paper introduces Post-hoc Concept Bottleneck Model via Representation Decomposition (PCBM-ReD), a novel pipeline that retrofits interpretability onto pretrained opaque models. PCBM-ReD automatically extracts visual concepts from a pre-trained encoder, employs multimodal large language models (MLLMs) to label and filter concepts based on visual identifiability and task relevance, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
