Context-Aware Prompt Tuning for Vision-Language Model with   Dual-Alignment

Hongyu Hu; Tiancheng Lin; Jie Wang; Zhenbang Sun; Yi Xu

arXiv:2309.04158·cs.CV·September 11, 2023

Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment

Hongyu Hu, Tiancheng Lin, Jie Wang, Zhenbang Sun, Yi Xu

PDF

Open Access

TL;DR

This paper proposes Dual-Aligned Prompt Tuning (DuAl-PT), a novel method that combines explicit context descriptions from large language models with implicit prompt learning to enhance vision-language model adaptation in few-shot tasks.

Contribution

The paper introduces DuAl-PT, which aligns prompts with both LLM-generated context and image features, improving few-shot learning and generalization in vision-language models.

Findings

01

Achieves superior performance on 11 datasets in few-shot recognition.

02

Enhances base-to-new generalization capabilities.

03

Serves as a strong baseline for prompt tuning methods.

Abstract

Large-scale vision-language models (VLMs), e.g., CLIP, learn broad visual concepts from tedious training data, showing superb generalization ability. Amount of prompt learning methods have been proposed to efficiently adapt the VLMs to downstream tasks with only a few training samples. We introduce a novel method to improve the prompt learning of vision-language models by incorporating pre-trained large language models (LLMs), called Dual-Aligned Prompt Tuning (DuAl-PT). Learnable prompts, like CoOp, implicitly model the context through end-to-end training, which are difficult to control and interpret. While explicit context descriptions generated by LLMs, like GPT-3, can be directly used for zero-shot classification, such prompts are overly relying on LLMs and still underexplored in few-shot domains. With DuAl-PT, we propose to learn more context-aware prompts, benefiting from both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Softmax · Context Optimization · Layer Normalization · Contrastive Language-Image Pre-training · Linear Layer