Improving Concept Alignment in Vision-Language Concept Bottleneck Models

Nithish Muthuchamy Selvaraj; Xiaobao Guo; Adams Wai-Kin Kong; Alex Kot

arXiv:2405.01825·cs.CV·August 27, 2024·1 cites

Improving Concept Alignment in Vision-Language Concept Bottleneck Models

Nithish Muthuchamy Selvaraj, Xiaobao Guo, Adams Wai-Kin Kong, Alex Kot

PDF

Open Access 1 Repo

TL;DR

This paper improves the alignment of human-defined concepts with visual inputs in vision-language models, enhancing interpretability and trustworthiness through a novel semi-supervised learning approach and class-level interventions.

Contribution

It introduces a Contrastive Semi-Supervised learning method that uses limited labeled data to better align concepts with visual inputs in CLIP models, improving interpretability.

Findings

01

Significant improvement in concept accuracy (+29.95)

02

Enhanced classification accuracy (+3.84)

03

Requires fewer human-annotated labels

Abstract

Concept Bottleneck Models (CBM) map images to human-interpretable concepts before making class predictions. Recent approaches automate CBM construction by prompting Large Language Models (LLMs) to generate text concepts and employing Vision Language Models (VLMs) to score these concepts for CBM training. However, it is desired to build CBMs with concepts defined by human experts rather than LLM-generated ones to make them more trustworthy. In this work, we closely examine the faithfulness of VLM concept scores for such expert-defined concepts in domains like fine-grained bird species and animal classification. Our investigations reveal that VLMs like CLIP often struggle to correctly associate a concept with the corresponding visual input, despite achieving a high classification performance. This misalignment renders the resulting models difficult to interpret and less reliable. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nms05/improving-concept-alignment-in-vision-language-concept-bottleneck-models
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training