Unified Vision and Language Prompt Learning

Yuhang Zang; Wei Li; Kaiyang Zhou; Chen Huang; Chen Change Loy

arXiv:2210.07225·cs.CV·October 14, 2022·55 cites

Unified Vision and Language Prompt Learning

Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy

PDF

Open Access 1 Repo

TL;DR

This paper introduces Unified Prompt Tuning (UPT), a method that jointly optimizes prompts across vision and language modalities, improving few-shot learning and domain generalization over unimodal prompt tuning methods.

Contribution

The paper proposes UPT, a novel approach that combines text and visual prompt tuning into a unified framework, addressing their individual limitations.

Findings

01

UPT outperforms unimodal prompt tuning on multiple datasets.

02

UPT achieves better trade-offs in few-shot learning scenarios.

03

UPT enhances domain generalization across diverse vision datasets.

Abstract

Prompt tuning, a parameter- and data-efficient transfer learning paradigm that tunes only a small number of parameters in a model's input space, has become a trend in the vision community since the emergence of large vision-language models like CLIP. We present a systematic study on two representative prompt tuning methods, namely text prompt tuning and visual prompt tuning. A major finding is that none of the unimodal prompt tuning methods performs consistently well: text prompt tuning fails on data with high intra-class visual variances while visual prompt tuning cannot handle low inter-class variances. To combine the best from both worlds, we propose a simple approach called Unified Prompt Tuning (UPT), which essentially learns a tiny neural network to jointly optimize prompts across different modalities. Extensive experiments on over 11 vision datasets show that UPT achieves a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuhangzang/upt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Text and Document Classification Technologies

MethodsContrastive Language-Image Pre-training