SparseSAM: Structured Sparsification of Activations in Segment Anything Models
Hoai-Chau Tran, Chi H. Nguyen, Duy M. H. Nguyen, Mathias Niepert, Fan Lai, Khoa D. Doan

TL;DR
SparseSAM is a training-free structured sparsification method for Segment Anything Models that accelerates inference and reduces memory usage with minimal accuracy loss by jointly sparsifying attention and MLP layers.
Contribution
It introduces a novel, training-free sparsification framework with deterministic sparse attention and a residual MLP routing mechanism, improving speed and memory efficiency.
Findings
Achieves 2x faster inference and 2.8x memory reduction.
Loses only 0.004 mIoU at 0.4 density, outperforming token merging.
Maintains high segmentation quality across benchmarks.
Abstract
The Segment Anything Model (SAM) achieves strong open-vocabulary segmentation, but its ViT-based image encoders dominate inference latency and memory. Existing activation compression methods, such as token merging, reduce the token length to process, yet introduce non-trivial runtime overhead and encounter catastrophic quality drop under high compression. Other methods applying Sparse Attention focus on attention alone, leaving the MLP fully dense and capping achievable speedup. We propose SparseSAM, a (i) training-free structured sparsification framework that jointly accelerates attention and MLP layers while preserving token identity. SparseSAM introduces (ii) Stripe-Sort Attention, which uses a deterministic Z-order permutation to transform dense attention into static hardware-friendly sparse patterns, eliminating dynamic masking overhead. SparseSAM further introduces a (iii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
