Pre-trained Vision-Language Models Learn Discoverable Visual Concepts

Yuan Zang; Tian Yun; Hao Tan; Trung Bui; Chen Sun

arXiv:2404.12652·cs.CV·January 15, 2025·1 cites

Pre-trained Vision-Language Models Learn Discoverable Visual Concepts

Yuan Zang, Tian Yun, Hao Tan, Trung Bui, Chen Sun

PDF

Open Access 1 Repo

TL;DR

Pre-trained vision-language models can learn and discover visual concepts like color and texture, which can be extracted using their interface with prompts, enabling better interpretability and reasoning.

Contribution

The paper introduces a new framework for identifying and ranking visual concepts learned by VLMs, addressing previous conflicting evaluation strategies.

Findings

01

VLMs learn diverse visual concepts that describe objects accurately.

02

The proposed CDL framework effectively discovers concepts based on mutual information.

03

Quantitative and human evaluations confirm the quality of discovered concepts.

Abstract

Do vision-language models (VLMs) pre-trained to caption an image of a "durian" learn visual concepts such as "brown" (color) and "spiky" (texture) at the same time? We aim to answer this question as visual concepts learned "for free" would enable wide applications such as neuro-symbolic reasoning or human-interpretable object classification. We assume that the visual concepts, if captured by pre-trained VLMs, can be extracted by their vision-language interface with text-based concept prompts. We observe that recent works prompting VLMs with concepts often differ in their strategies to define and evaluate the visual concepts, leading to conflicting conclusions. We propose a new concept definition strategy based on two observations: First, certain concept prompts include shortcuts that recognize correct concepts for wrong reasons; Second, multimodal information (e.g. visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

brown-palm/concept-discovery-and-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications