Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs
Yunqi Hong, Sohyun An, Andrew Bai, Neil Y.C. Lin, Cho-Jui Hsieh

TL;DR
AutoSEP is an unsupervised, iterative prompt learning framework that leverages unlabeled data to significantly enhance the fine-grained classification capabilities of multimodal large language models without additional training.
Contribution
We introduce AutoSEP, a novel self-supervised prompt learning method that improves fine-grained image classification with MLLMs using only unlabeled data and black-box access.
Findings
AutoSEP improves average classification accuracy by 13% over standard zero-shot methods.
AutoSEP outperforms other unsupervised baselines on multiple datasets.
The method requires no fine-tuning or training, only unlabeled data and black-box access.
Abstract
Despite Multimodal Large Language Models (MLLMs) showing promising results on general zero-shot image classification tasks, fine-grained image classification remains challenging. It demands precise attention to subtle visual details to distinguish between visually similar subcategories--details that MLLMs may easily overlook without explicit guidance. To address this, we introduce AutoSEP, an iterative self-supervised prompt learning framework designed to enhance MLLM fine-grained classification capabilities in a fully unsupervised manner. Our core idea is to leverage unlabeled data to learn a description prompt that guides MLLMs in identifying crucial discriminative features within an image, and boosts classification accuracy. We developed an automatic self-enhancing prompt learning framework called AutoSEP to iteratively improve the description prompt using unlabeled data, based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
