# Local–Global Aware Concept Bottleneck Models for Interpretable Image Classification

**Authors:** Ci Liu, Zijie Lin, Chen Tang

PMC · DOI: 10.3390/s26061833 · Sensors (Basel, Switzerland) · 2026-03-14

## TL;DR

This paper introduces a new model for image classification that improves interpretability by focusing on local and global visual features.

## Contribution

The novel LGA-CBM model enhances concept prediction by refining CLIP-based scores with local-global alignment and disambiguation mechanisms.

## Key findings

- LGA-CBM outperforms existing methods in accuracy and interpretability across six datasets.
- The model produces explanations that align closely with human understanding of images.

## Abstract

Concept Bottleneck Models facilitate interpretable image classification by predicting human-understandable concepts prior to class labels. However, when constructed upon CLIP, they exhibit unreliable concept scores stemming from CLIP’s global representation bias and insufficient region-level sensitivity, which severely constrain their effectiveness in sensor-driven applications like remote sensing and medical imaging where localized visual evidence is critical. To mitigate this, we propose the Local–Global Aware Concept Bottleneck Model (LGA-CBM), which improves concept prediction through a training-free refinement pipeline. Building on initial CLIP-derived concept scores, LGA-CBM incorporates three key components: a Dual Masking Guided Concept Score Refinement (DMCSR) module that exploits attention weights to strengthen region–concept alignment; a Local-to-Global Concept Reidentification (L2GCR) strategy to harmonize local and global activations; and a Similar Concepts Correction Mechanism (SCCM) integrating Grounding DINO for fine-grained disambiguation. A sparse linear layer then maps the refined concepts to class labels, enabling highly interpretable classification with minimal concept usage. Experiments across six benchmark datasets demonstrate that LGA-CBM consistently achieves state-of-the-art performance in both accuracy and interpretability, producing explanations that align closely with human cognition.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030245/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030245/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030245/full.md

---
Source: https://tomesphere.com/paper/PMC13030245