Improved Few-Shot Image Classification Through Multiple-Choice Questions

Dipika Khullar; Emmett Goodman; Negin Sokhandan

arXiv:2407.16145·cs.LG·July 24, 2024

Improved Few-Shot Image Classification Through Multiple-Choice Questions

Dipika Khullar, Emmett Goodman, Negin Sokhandan

PDF

TL;DR

This paper introduces a training-free few-shot image classification method that leverages multiple-choice questions with VQA models to extract and combine visual features, outperforming traditional encoders and zero-shot baselines.

Contribution

The proposed approach uses multiple-choice prompts to create prompt-specific latent representations, enhancing few-shot classification without additional training.

Findings

01

Outperforms pure visual encoders and zero-shot VQA baselines on standard datasets.

02

Effective in diverse attribute settings like fabric, style, and texture.

03

Maintains flexibility and dynamic advantages of VQA models.

Abstract

Through a simple multiple choice language prompt a VQA model can operate as a zero-shot image classifier, producing a classification label. Compared to typical image encoders, VQA models offer an advantage: VQA-produced image embeddings can be infused with the most relevant visual information through tailored language prompts. Nevertheless, for most tasks, zero-shot VQA performance is lacking, either because of unfamiliar category names, or dissimilar pre-training data and test data distributions. We propose a simple method to boost VQA performance for image classification using only a handful of labeled examples and a multiple-choice question. This few-shot method is training-free and maintains the dynamic and flexible advantages of the VQA model. Rather than relying on the final language output, our approach uses multiple-choice questions to extract prompt-specific latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.