Enhancing Visual Classification using Comparative Descriptors

Hankyeol Lee; Gawon Seo; Wonseok Choi; Geunyoung Jung; Kyungwoo Song,; Jiyoung Jung

arXiv:2411.05357·cs.CV·November 12, 2024

Enhancing Visual Classification using Comparative Descriptors

Hankyeol Lee, Gawon Seo, Wonseok Choi, Geunyoung Jung, Kyungwoo Song,, Jiyoung Jung

PDF

Open Access 1 Repo

TL;DR

This paper introduces comparative descriptors that highlight differences between similar classes, improving the accuracy of vision-language models like CLIP in visual classification tasks, especially where classes are subtly different.

Contribution

We propose a novel method of using comparative descriptors to better differentiate similar classes, enhancing zero-shot classification performance of vision-language models.

Findings

01

Improved top-1 and top-5 accuracy in classification tasks.

02

Descriptors focusing on class differences outperform generic descriptors.

03

Filtering descriptors based on proximity in CLIP space boosts robustness.

Abstract

The performance of vision-language models (VLMs), such as CLIP, in visual classification tasks, has been enhanced by leveraging semantic knowledge from large language models (LLMs), including GPT. Recent studies have shown that in zero-shot classification tasks, descriptors incorporating additional cues, high-level concepts, or even random characters often outperform those using only the category name. In many classification tasks, while the top-1 accuracy may be relatively low, the top-5 accuracy is often significantly higher. This gap implies that most misclassifications occur among a few similar classes, highlighting the model's difficulty in distinguishing between classes with subtle differences. To address this challenge, we introduce a novel concept of comparative descriptors. These descriptors emphasize the unique features of a target class against its most similar classes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hk1ee/comparative-clip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Cosine Annealing · Adam · Attention Dropout · Multi-Head Attention · Residual Connection · Softmax · Byte Pair Encoding · Weight Decay