Follow-Up Differential Descriptions: Language Models Resolve Ambiguities   for Image Classification

Reza Esfandiarpoor; Stephen H. Bach

arXiv:2311.07593·cs.CL·March 18, 2024·1 cites

Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification

Reza Esfandiarpoor, Stephen H. Bach

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces FuDD, a zero-shot method that uses large language models to generate tailored class descriptions, effectively resolving ambiguities and improving image classification accuracy with vision-language models like CLIP.

Contribution

FuDD is a novel approach that dynamically generates class descriptions to resolve ambiguities, outperforming existing methods across multiple datasets.

Findings

01

FuDD outperforms generic description ensembles and naive LLM descriptions.

02

Differential descriptions significantly improve classification accuracy.

03

High-quality descriptions achieve performance comparable to few-shot methods.

Abstract

A promising approach for improving the performance of vision-language models like CLIP for image classification is to extend the class descriptions (i.e., prompts) with related attributes, e.g., using brown sparrow instead of sparrow. However, current zero-shot methods select a subset of attributes regardless of commonalities between the target classes, potentially providing no useful information that would have helped to distinguish between them. For instance, they may use color instead of bill shape to distinguish between sparrows and wrens, which are both brown. We propose Follow-up Differential Descriptions (FuDD), a zero-shot approach that tailors the class descriptions to each dataset and leads to additional attributes that better differentiate the target classes. FuDD first identifies the ambiguous classes for each image, and then uses a Large Language Model (LLM) to generate new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

batsresearch/fudd
pytorchOfficial

Videos

Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training