QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models
Tien-Yu Chi, Hung-Yueh Chiang, Diana Marculescu, Kai-Chiang Wu

TL;DR
QuarterMap is a post-training pruning technique that reduces spatial redundancy in state space models, significantly improving inference speed on vision tasks without retraining or accuracy loss.
Contribution
It introduces a novel activation pruning method tailored for SSM-based vision models, enhancing throughput efficiently without retraining.
Findings
Up to 11% speedup on VMamba with minimal accuracy drop
Effective on medical imaging tasks with consistent throughput gains
Outperforms token merging methods like ToMe in efficiency
Abstract
State space models (SSMs) reduce the quadratic complexity of transformers by leveraging linear recurrence. Recently, VMamba has emerged as a strong SSM-based vision backbone, yet remains bottlenecked by spatial redundancy in its four-directional scan. We propose QuarterMap, a post-training activation pruning method that removes redundant spatial activations before scanning and restores dimensions via nearest-neighbor upsampling. Our method improves throughput without retraining. On ImageNet-1K, QuarterMap achieves up to 11% speedup on VMamba with less than 0.9% accuracy drop, and yields similar gains on ADE20K segmentation. Beyond VMamba, we validate QuarterMap on MedMamba, a domain-specific model that shares the same four-directional scanning structure, where it consistently improves throughput while preserving accuracy across multiple medical imaging tasks. Compared to token merging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
