Prefix Conditioning Unifies Language and Label Supervision

Kuniaki Saito; Kihyuk Sohn; Xiang Zhang; Chun-Liang Li; Chen-Yu Lee,; Kate Saenko; Tomas Pfister

arXiv:2206.01125·cs.CV·May 17, 2023

Prefix Conditioning Unifies Language and Label Supervision

Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee,, Kate Saenko, Tomas Pfister

PDF

Open Access

TL;DR

This paper introduces prefix tokens to distinguish dataset types during training, enabling models to effectively unify image-classification and caption datasets, thereby enhancing zero-shot recognition and robustness.

Contribution

The proposed prefix conditioning method disentangles dataset bias, allowing shared knowledge transfer and mode switching between datasets, improving zero-shot performance.

Findings

01

Improved zero-shot image recognition accuracy.

02

Enhanced robustness to distribution shifts.

03

Effective integration with existing VL pre-training methods.

Abstract

Image-classification datasets have been used to pretrain image recognition models. Recently, web-scale image-caption datasets have emerged as a source of powerful pretraining alternative. Image-caption datasets are more ``open-domain'', containing a wider variety of scene types and vocabulary words than traditional classification datasets, and models trained on these datasets have demonstrated strong performance on few- and zero-shot recognition tasks. When naively unifying image-classification and -caption dataset, we show that such dataset biases negatively affect pre-training by reducing the generalizability of learned representations and thus jeopardizing zero-shot performance since the unification can tailor the model for the classification dataset, making it vulnerable to the distribution shift from the dataset. In this work, we address the problem by disentangling the dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research

MethodsContrastive Language-Image Pre-training · Contrastive Learning