Selective Visual Prompting in Vision Mamba

Yifeng Yao; Zichen Liu; Zhenyu Cui; Yuxin Peng; Jiahuan Zhou

arXiv:2412.08947·cs.CV·December 13, 2024

Selective Visual Prompting in Vision Mamba

Yifeng Yao, Zichen Liu, Zhenyu Cui, Yuxin Peng, Jiahuan Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces Selective Visual Prompting (SVP), a novel fine-tuning method tailored for Vision Mamba models that enhances discriminative information propagation through token-wise prompts and dual-path structure.

Contribution

The paper proposes SVP, a new fine-tuning approach for Vim models that employs lightweight, token-wise prompts and a dual-path structure to improve information propagation and task performance.

Findings

01

SVP outperforms existing prompting methods on large-scale benchmarks.

02

SVP effectively activates input and forget gates in Vim, enhancing discriminative information flow.

03

Dual-path structure captures shared and specific layer information, boosting adaptability.

Abstract

Pre-trained Vision Mamba (Vim) models have demonstrated exceptional performance across various computer vision tasks in a computationally efficient manner, attributed to their unique design of selective state space models. To further extend their applicability to diverse downstream vision tasks, Vim models can be adapted using the efficient fine-tuning technique known as visual prompting. However, existing visual prompting methods are predominantly tailored for Vision Transformer (ViT)-based models that leverage global attention, neglecting the distinctive sequential token-wise compression and propagation characteristics of Vim. Specifically, existing prompt tokens prefixed to the sequence are insufficient to effectively activate the input and forget gates across the entire sequence, hindering the extraction and propagation of discriminative information. To address this limitation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhoujiahuan1991/aaai2025-svp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Visual perception and processing mechanisms · Satellite Image Processing and Photogrammetry

MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing