iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection
Nan Zhou, Jiaxin Chen, Di Huang

TL;DR
iVPT introduces a cross-layer dynamic connection and an attentive reinforcement mechanism to improve task-relevant information sharing in visual prompt tuning, leading to better performance across various vision tasks.
Contribution
The paper proposes iVPT, a novel VPT method with cross-layer dynamic connections and an attentive reinforcement mechanism for enhanced task-relevant information sharing.
Findings
Outperforms state-of-the-art methods on 24 benchmarks.
Effectively shares task-relevant information across layers.
Enhances attention process flexibility in VPT.
Abstract
Recent progress has shown great potential of visual prompt tuning (VPT) when adapting pre-trained vision transformers to various downstream tasks. However, most existing solutions independently optimize prompts at each layer, thereby neglecting the usage of task-relevant information encoded in prompt tokens across layers. Additionally, existing prompt structures are prone to interference from task-irrelevant noise in input images, which can do harm to the sharing of task-relevant information. In this paper, we propose a novel VPT approach, \textbf{iVPT}. It innovatively incorporates a cross-layer dynamic connection (CDC) for input prompt tokens from adjacent layers, enabling effective sharing of task-relevant information. Furthermore, we design a dynamic aggregation (DA) module that facilitates selective sharing of information between layers. The combination of CDC and DA enhances the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersonal Information Management and User Behavior · Virtual Reality Applications and Impacts
