Manipulating the Label Space for In-Context Classification
Haokun Chen, Xu Yang, Yuhang Huang, Zihan Wu, Jing Wang, Xin Geng

TL;DR
This paper introduces label space manipulation strategies to enhance in-context classification in vision-language models, achieving higher accuracy with fewer examples compared to existing methods.
Contribution
It proposes two novel techniques, Label Distribution Enhancement and Visual Descriptions Enhancement, to increase knowledge density in in-context examples, improving classification performance.
Findings
Achieved 76.21% accuracy on ImageNet with 2 shots, surpassing CLIP.
Raised 1-shot accuracy on CUB-200 from 48.86% to 69.05%.
Demonstrated effectiveness across diverse datasets.
Abstract
After pre-training by generating the next word conditional on previous words, the Language Model (LM) acquires the ability of In-Context Learning (ICL) that can learn a new task conditional on the context of the given in-context examples (ICEs). Similarly, visually-conditioned Language Modelling is also used to train Vision-Language Models (VLMs) with ICL ability. However, such VLMs typically exhibit weaker classification abilities compared to contrastive learning-based models like CLIP, since the Language Modelling objective does not directly contrast whether an object is paired with a text. To improve the ICL of classification, using more ICEs to provide more knowledge is a straightforward way. However, this may largely increase the selection time, and more importantly, the inclusion of additional in-context images tends to extend the length of the in-context sequence beyond the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
