Selective Visual Prompting in Vision Mamba
Yifeng Yao, Zichen Liu, Zhenyu Cui, Yuxin Peng, Jiahuan Zhou

TL;DR
This paper introduces Selective Visual Prompting (SVP), a novel fine-tuning method tailored for Vision Mamba models that enhances discriminative information propagation through token-wise prompts and dual-path structure.
Contribution
The paper proposes SVP, a new fine-tuning approach for Vim models that employs lightweight, token-wise prompts and a dual-path structure to improve information propagation and task performance.
Findings
SVP outperforms existing prompting methods on large-scale benchmarks.
SVP effectively activates input and forget gates in Vim, enhancing discriminative information flow.
Dual-path structure captures shared and specific layer information, boosting adaptability.
Abstract
Pre-trained Vision Mamba (Vim) models have demonstrated exceptional performance across various computer vision tasks in a computationally efficient manner, attributed to their unique design of selective state space models. To further extend their applicability to diverse downstream vision tasks, Vim models can be adapted using the efficient fine-tuning technique known as visual prompting. However, existing visual prompting methods are predominantly tailored for Vision Transformer (ViT)-based models that leverage global attention, neglecting the distinctive sequential token-wise compression and propagation characteristics of Vim. Specifically, existing prompt tokens prefixed to the sequence are insufficient to effectively activate the input and forget gates across the entire sequence, hindering the extraction and propagation of discriminative information. To address this limitation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Visual perception and processing mechanisms · Satellite Image Processing and Photogrammetry
MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing
