MambaScope: Coarse-to-Fine Scoping for Efficient Vision Mamba
Shanhui Liu, Rui Xu, Yunke Wang

TL;DR
MambaScope introduces an adaptive, coarse-to-fine inference framework for Vision Mamba that dynamically allocates computational resources based on image complexity, improving efficiency and accuracy over existing token reduction methods.
Contribution
It proposes a novel adaptive framework that applies coarse-to-fine processing in Vision Mamba, reducing unnecessary computation on simple images while refining complex ones.
Findings
Outperforms baseline Vision Mamba in accuracy and efficiency
Reduces computation by processing simple images at coarse resolution
Improves visual task performance with dynamic resolution adjustment
Abstract
Vision Mamba has emerged as a promising and efficient alternative to Vision Transformers, yet its efficiency remains fundamentally constrained by the number of input tokens. Existing token reduction approaches typically adopt token pruning or merging to reduce computation. However, they inherently lead to information loss as they discard or compress token representations. This problem is further exacerbated when the same fine-grained token processing is uniformly applied across all images regardless of visual complexity. We observe that not all inputs require fine-grained processing: simple images can be effectively handled at a coarse resolution, while only complex ones require refinement. Based on this insight, we propose MambaScope, an adaptive framework for efficient inference for Vision Mamba. MambaScope first performs coarse-grained inference by dividing the input image into large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
