Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification
Reza Esfandiarpoor, Stephen H. Bach

TL;DR
This paper introduces FuDD, a zero-shot method that uses large language models to generate tailored class descriptions, effectively resolving ambiguities and improving image classification accuracy with vision-language models like CLIP.
Contribution
FuDD is a novel approach that dynamically generates class descriptions to resolve ambiguities, outperforming existing methods across multiple datasets.
Findings
FuDD outperforms generic description ensembles and naive LLM descriptions.
Differential descriptions significantly improve classification accuracy.
High-quality descriptions achieve performance comparable to few-shot methods.
Abstract
A promising approach for improving the performance of vision-language models like CLIP for image classification is to extend the class descriptions (i.e., prompts) with related attributes, e.g., using brown sparrow instead of sparrow. However, current zero-shot methods select a subset of attributes regardless of commonalities between the target classes, potentially providing no useful information that would have helped to distinguish between them. For instance, they may use color instead of bill shape to distinguish between sparrows and wrens, which are both brown. We propose Follow-up Differential Descriptions (FuDD), a zero-shot approach that tailors the class descriptions to each dataset and leads to additional attributes that better differentiate the target classes. FuDD first identifies the ambiguous classes for each image, and then uses a Large Language Model (LLM) to generate new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsContrastive Language-Image Pre-training
