SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language   Pre-trained Models

Yang Zhou; Yongjian Wu; Jiya Saiyin; Bingzheng Wei; Maode Lai; Eric; Chang; Yan Xu

arXiv:2407.11414·cs.CV·July 17, 2024

SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models

Yang Zhou, Yongjian Wu, Jiya Saiyin, Bingzheng Wei, Maode Lai, Eric, Chang, Yan Xu

PDF

Open Access 1 Repo

TL;DR

SDPT introduces a novel prompt tuning method that synchronously aligns and represents text and image modalities in visual-language models, significantly improving transferability and performance with minimal additional parameters.

Contribution

It proposes a unified prompt tuning approach with inverse projections for modal alignment, enhancing generalization in fusion-based VLPMs.

Findings

01

Achieves superior results with only 0.04% additional parameters

02

Effectively aligns modalities for better transferability

03

Outperforms existing single- and dual-modal methods

Abstract

Prompt tuning methods have achieved remarkable success in parameter-efficient fine-tuning on large pre-trained models. However, their application to dual-modal fusion-based visual-language pre-trained models (VLPMs), such as GLIP, has encountered issues. Existing prompt tuning methods have not effectively addressed the modal mapping and aligning problem for tokens in different modalities, leading to poor transfer generalization. To address this issue, we propose Synchronous Dual Prompt Tuning (SDPT). SDPT initializes a single set of learnable unified prototype tokens in the established modal aligning space to represent the aligned semantics of text and image modalities for downstream tasks. Furthermore, SDPT establishes inverse linear projections that require no training to embed the information of unified prototype tokens into the input space of different modalities. The inverse linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wuyongjiancode/sdpt
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training