VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi, Minjing Dong, Mingjia Li, Chang Xu

TL;DR
The paper introduces VSSD, a non-causal variant of State Space Duality for vision models, improving performance and efficiency in vision tasks like classification, detection, and segmentation.
Contribution
It proposes a non-causal VSSD model that enhances SSD-based vision models by removing causal dependencies, leading to better performance and efficiency.
Findings
VSSD outperforms existing SSM-based models on multiple benchmarks.
The non-causal approach improves model efficiency in vision tasks.
Extensive experiments validate the effectiveness of VSSD.
Abstract
Vision transformers have significantly advanced the field of computer vision, offering robust modeling capabilities and global receptive field. However, their high computational demands limit their applicability in processing long sequences. To tackle this issue, State Space Models (SSMs) have gained prominence in vision tasks as they offer linear computational complexity. Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. Specifically, we propose to discard the magnitude of interactions between the hidden state and tokens while preserving their relative weights, which relieves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Visual Attention and Saliency Detection · Visual perception and processing mechanisms
MethodsNon Maximum Suppression · 1x1 Convolution · Convolution · SSD
