APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning

Guiming Cao; Kaize Shi; Hong Fu; Huaiwen Zhang; Guandong Xu

arXiv:2401.06827·cs.CV·January 24, 2024·1 cites

APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning

Guiming Cao, Kaize Shi, Hong Fu, Huaiwen Zhang, Guandong Xu

PDF

Open Access

TL;DR

APLe introduces a token-wise adaptive method for multi-modal prompt learning in vision-language models, sequentially tuning vision and language prompts to enhance generalization and robustness across tasks.

Contribution

It proposes a novel sequential training approach for multi-modal prompts, improving generalization and robustness in vision-language models like CLIP.

Findings

01

Achieves competitive performance with state-of-the-art methods.

02

Demonstrates robustness in prompt-length experiments.

03

Outperforms existing prompt tuning approaches in generalization.

Abstract

Pre-trained Vision-Language (V-L) models set the benchmark for generalization to downstream tasks among the noteworthy contenders. Many characteristics of the V-L model have been explored in existing research including the challenge of the sensitivity to text input and the tuning process across multi-modal prompts. With the advanced utilization of the V-L model like CLIP, recent approaches deploy learnable prompts instead of hand-craft prompts to boost the generalization performance and address the aforementioned challenges. Inspired by layer-wise training, which is wildly used in image fusion, we note that using a sequential training process to adapt different modalities branches of CLIP efficiently facilitates the improvement of generalization. In the context of addressing the multi-modal prompting challenge, we propose Token-wise Adaptive for Multi-modal Prompt Learning (APLe) for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training