Improved Few-Shot Image Classification Through Multiple-Choice Questions
Dipika Khullar, Emmett Goodman, Negin Sokhandan

TL;DR
This paper introduces a training-free few-shot image classification method that leverages multiple-choice questions with VQA models to extract and combine visual features, outperforming traditional encoders and zero-shot baselines.
Contribution
The proposed approach uses multiple-choice prompts to create prompt-specific latent representations, enhancing few-shot classification without additional training.
Findings
Outperforms pure visual encoders and zero-shot VQA baselines on standard datasets.
Effective in diverse attribute settings like fabric, style, and texture.
Maintains flexibility and dynamic advantages of VQA models.
Abstract
Through a simple multiple choice language prompt a VQA model can operate as a zero-shot image classifier, producing a classification label. Compared to typical image encoders, VQA models offer an advantage: VQA-produced image embeddings can be infused with the most relevant visual information through tailored language prompts. Nevertheless, for most tasks, zero-shot VQA performance is lacking, either because of unfamiliar category names, or dissimilar pre-training data and test data distributions. We propose a simple method to boost VQA performance for image classification using only a handful of labeled examples and a multiple-choice question. This few-shot method is training-free and maintains the dynamic and flexible advantages of the VQA model. Rather than relying on the final language output, our approach uses multiple-choice questions to extract prompt-specific latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
