PTQ4VM: Post-Training Quantization for Visual Mamba
Younghyun Cho, Changhun Lee, Seonggon Kim, Eunhyeok Park

TL;DR
This paper introduces PTQ4VM, a post-training quantization method tailored for Visual Mamba, addressing its unique quantization challenges to enable fast, low-accuracy-loss model deployment on GPUs.
Contribution
It presents the first quantization approach specifically designed for Visual Mamba, with novel strategies like PTS and JLSS to improve quantization efficiency and effectiveness.
Findings
Achieves up to 1.83x GPU speedup with negligible accuracy loss.
Converts pretrained Visual Mamba models in under 15 minutes.
Effectively addresses quantization challenges unique to Visual Mamba.
Abstract
Visual Mamba is an approach that extends the selective space state model, Mamba, to vision tasks. It processes image tokens sequentially in a fixed order, accumulating information to generate outputs. Despite its growing popularity for delivering high-quality outputs at a low computational cost across various tasks, Visual Mamba is highly susceptible to quantization, which makes further performance improvements challenging. Our analysis reveals that the fixed token access order in Visual Mamba introduces unique quantization challenges, which we categorize into three main issues: 1) token-wise variance, 2) channel-wise outliers, and 3) a long tail of activations. To address these challenges, we propose Post-Training Quantization for Visual Mamba (PTQ4VM), which introduces two key strategies: Per-Token Static (PTS) quantization and Joint Learning of Smoothing Scale and Step Size (JLSS).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Stabilization · Advanced Vision and Imaging
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
