Prefix Conditioning Unifies Language and Label Supervision
Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee,, Kate Saenko, Tomas Pfister

TL;DR
This paper introduces prefix tokens to distinguish dataset types during training, enabling models to effectively unify image-classification and caption datasets, thereby enhancing zero-shot recognition and robustness.
Contribution
The proposed prefix conditioning method disentangles dataset bias, allowing shared knowledge transfer and mode switching between datasets, improving zero-shot performance.
Findings
Improved zero-shot image recognition accuracy.
Enhanced robustness to distribution shifts.
Effective integration with existing VL pre-training methods.
Abstract
Image-classification datasets have been used to pretrain image recognition models. Recently, web-scale image-caption datasets have emerged as a source of powerful pretraining alternative. Image-caption datasets are more ``open-domain'', containing a wider variety of scene types and vocabulary words than traditional classification datasets, and models trained on these datasets have demonstrated strong performance on few- and zero-shot recognition tasks. When naively unifying image-classification and -caption dataset, we show that such dataset biases negatively affect pre-training by reducing the generalizability of learned representations and thus jeopardizing zero-shot performance since the unification can tailor the model for the classification dataset, making it vulnerable to the distribution shift from the dataset. In this work, we address the problem by disentangling the dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research
MethodsContrastive Language-Image Pre-training · Contrastive Learning
