TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Xiaowen Ma, Zhenliang Ni, Xinghao Chen

TL;DR
TinyViM introduces a frequency decoupling approach using a Laplace mixer and frequency ramp inception to enhance tiny hybrid vision Mamba models, achieving superior performance and efficiency across multiple vision tasks.
Contribution
The paper proposes a novel frequency decoupling method with a Laplace mixer and frequency ramp inception to improve tiny hybrid vision Mamba models.
Findings
Outperforms similar scale Convolution, Transformer, and Mamba models.
Achieves 2-3 times higher throughput than other Mamba-based models.
Excels in image classification, segmentation, and detection tasks.
Abstract
Mamba has shown great potential for computer vision due to its linear complexity in modeling the global context with respect to the input length. However, existing lightweight Mamba-based backbones cannot demonstrate performance that matches Convolution or Transformer-based methods. By observing, we find that simply modifying the scanning path in the image domain is not conducive to fully exploiting the potential of vision Mamba. In this paper, we first perform comprehensive spectral and quantitative analyses, and verify that the Mamba block mainly models low-frequency information under Convolution-Mamba hybrid architecture. Based on the analyses, we introduce a novel Laplace mixer to decouple the features in terms of frequency and input only the low-frequency components into the Mamba block. In addition, considering the redundancy of the features and the different requirements for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
MethodsDense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax · Attention Is All You Need
