Does VLM Classification Benefit from LLM Description Semantics?

Pingchuan Ma; Lennart Rietdorf; Dmytro Kotovenko; Vincent Tao Hu,; Bj\"orn Ommer

arXiv:2412.11917·cs.CV·December 20, 2024

Does VLM Classification Benefit from LLM Description Semantics?

Pingchuan Ma, Lennart Rietdorf, Dmytro Kotovenko, Vincent Tao Hu,, Bj\"orn Ommer

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates whether Large Language Model-generated descriptions genuinely enhance Vision-Language Model classification by analyzing their semantic contribution, proposing an evaluation scenario and a training-free selection method to improve accuracy and explainability.

Contribution

It introduces an evaluation scenario to distinguish true semantic benefits from noise effects and proposes a training-free method for selecting discriminative descriptions for VLM classification.

Findings

01

Descriptions with genuine semantics improve classification accuracy.

02

The proposed method outperforms baseline approaches across seven datasets.

03

Insights into explainability of description-based classification are provided.

Abstract

Accurately describing images with text is a foundation of explainable AI. Vision-Language Models (VLMs) like CLIP have recently addressed this by aligning images and texts in a shared embedding space, expressing semantic similarities between vision and language embeddings. VLM classification can be improved with descriptions generated by Large Language Models (LLMs). However, it is difficult to determine the contribution of actual description semantics, as the performance gain may also stem from a semantic-agnostic ensembling effect, where multiple modified text prompts act as a noisy test-time augmentation for the original one. We propose an alternative evaluation scenario to decide if a performance boost of LLM-generated descriptions is caused by such a noise augmentation effect or rather by genuine description semantics. The proposed scenario avoids noisy test-time augmentation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

compvis/disclip
pytorchOfficial

Videos

Does VLM Classification Benefit from LLM Description Semantics?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Advanced Computational Techniques and Applications

MethodsContrastive Language-Image Pre-training