CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model

Houji Wen,Jiangyong Yu,Jun Li,Dawei Yang

arXiv:2605.16901·cs.CV·May 19, 2026

CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model

Houji Wen,Jiangyong Yu,Jun Li,Dawei Yang

PDF

TL;DR

CAR-SAM introduces a novel post-training quantization framework for Segment Anything Models, effectively addressing attention dissipation and reconstruction oscillation to enable 4-bit model deployment.

Contribution

It proposes MAC and JCAR strategies specifically designed for SAMs, improving quantization stability and performance over existing methods.

Findings

01

Quantizes SAM models to 4-bit with significant accuracy gains

02

Outperforms existing PTQ methods by 14.6% and 6.6% mAP on SAM-B and SAM-L

03

Enhances model deployment on resource-constrained devices

Abstract

Segment Anything Models (SAMs) are extensively used in computer vision for universal image segmentation, but deploying them on resource-constrained devices is challenging due to their high computational and memory demands. Post-Training Quantization (PTQ) is a widely used technique for model compression and acceleration. However, existing PTQ methods fail to consider the cross-attention architecture in the SAM decoder. This degradation primarily stems from the unique challenges posed by SAMs: (1) Attention dissipation, where the attention information in the decoder, which is crucial for representing segmentation masks, collapses into a diffuse and non-semantic form under low-bit quantization; and (2) Reconstruction oscillation, where bidirectional coupling within the two-way transformer introduces cross-branch error interference and destabilizes convergence. To tackle these issues, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.