Concept-wise Attention for Fine-grained Concept Bottleneck Models

Minghong Zhong; Guoshuai Zou; Kanghao Chen; Dexia Chen; Ruixuan Wang

arXiv:2604.15748·cs.CV·April 21, 2026

Concept-wise Attention for Fine-grained Concept Bottleneck Models

Minghong Zhong, Guoshuai Zou, Kanghao Chen, Dexia Chen, Ruixuan Wang

PDF

TL;DR

The paper introduces CoAt-CBM, a novel framework that enhances fine-grained concept alignment and interpretability in concept bottleneck models using learnable visual queries and contrastive optimization.

Contribution

It proposes a new method employing concept-wise visual queries and contrastive loss to improve concept alignment and interpretability in CBMs.

Findings

01

CoAt-CBM outperforms state-of-the-art methods in experiments.

02

The approach achieves adaptive fine-grained image-concept alignment.

03

It enhances interpretability of concept predictions.

Abstract

Recently impressive performance has been achieved in Concept Bottleneck Models (CBM) by utilizing the image-text alignment learned by a large pre-trained vision-language model (i.e. CLIP). However, there exist two key limitations in concept modeling. Existing methods often suffer from pre-training biases, manifested as granularity misalignment or reliance on structural priors. Moreover, fine-tuning with Binary Cross-Entropy (BCE) loss treats each concept independently, which ignores mutual exclusivity among concepts, leading to suboptimal alignment. To address these limitations, we propose Concept-wise Attention for Fine-grained Concept Bottleneck Models (CoAt-CBM), a novel framework that achieves adaptive fine-grained image-concept alignment and high interpretability. Specifically, CoAt-CBM employs learnable concept-wise visual queries to adaptively obtain fine-grained concept-wise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.