Enhancing Visual Classification using Comparative Descriptors
Hankyeol Lee, Gawon Seo, Wonseok Choi, Geunyoung Jung, Kyungwoo Song,, Jiyoung Jung

TL;DR
This paper introduces comparative descriptors that highlight differences between similar classes, improving the accuracy of vision-language models like CLIP in visual classification tasks, especially where classes are subtly different.
Contribution
We propose a novel method of using comparative descriptors to better differentiate similar classes, enhancing zero-shot classification performance of vision-language models.
Findings
Improved top-1 and top-5 accuracy in classification tasks.
Descriptors focusing on class differences outperform generic descriptors.
Filtering descriptors based on proximity in CLIP space boosts robustness.
Abstract
The performance of vision-language models (VLMs), such as CLIP, in visual classification tasks, has been enhanced by leveraging semantic knowledge from large language models (LLMs), including GPT. Recent studies have shown that in zero-shot classification tasks, descriptors incorporating additional cues, high-level concepts, or even random characters often outperform those using only the category name. In many classification tasks, while the top-1 accuracy may be relatively low, the top-5 accuracy is often significantly higher. This gap implies that most misclassifications occur among a few similar classes, highlighting the model's difficulty in distinguishing between classes with subtle differences. To address this challenge, we introduce a novel concept of comparative descriptors. These descriptors emphasize the unique features of a target class against its most similar classes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Cosine Annealing · Adam · Attention Dropout · Multi-Head Attention · Residual Connection · Softmax · Byte Pair Encoding · Weight Decay
