Q-SAM2: Accurate Quantization for Segment Anything Model 2

Nicola Farronato; Florian Scheidegger; Mattia Rigotti; Cristiano Malossi; Michele Magno; Haotong Qin

arXiv:2506.09782·cs.CV·November 25, 2025

Q-SAM2: Accurate Quantization for Segment Anything Model 2

Nicola Farronato, Florian Scheidegger, Mattia Rigotti, Cristiano Malossi, Michele Magno, Haotong Qin

PDF

Open Access

TL;DR

Q-SAM2 introduces a novel low-bit quantization method for the Segment Anything Model 2, significantly reducing model size and computational costs while maintaining high segmentation accuracy through innovative calibration and clipping techniques.

Contribution

The paper proposes Variance-Reduced Calibration and Learnable Statistical Clipping, novel methods that improve low-bit quantization performance for SAM2.

Findings

01

Achieves up to 9.7 percentage points improvement in video segmentation accuracy.

02

Reduces model size by 8 times compared to BF16 baseline.

03

Outperforms state-of-the-art quantization schemes in ultra-low 2-bit regime.

Abstract

The Segment Anything Model 2 (SAM2) is a powerful foundation model for promptable segmentation. However, its high computational and memory costs are a major barrier to deployment on resource-constrained devices. In this paper, we present Q-SAM2, an accurate low-bit quantization method that achieves high compression and high fidelity. To address performance degradation arising from challenging weight and activation distributions during quantization, Q-SAM2 introduces two novel contributions: Variance-Reduced Calibration (VRC), an initialization method that reduces weight statistical variance by minimizing the Frobenius norm over a small calibration batch; and Learnable Statistical Clipping (LSC), a Quantization-Aware Training (QAT) method that learns momentum-stabilized clipping factors to manage outliers in weights and activations. Comprehensive experiments demonstrate that Q-SAM2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Image and Video Quality Assessment

MethodsSoftmax · Attention Is All You Need · Linear Layer