Stable Segment Anything Model
Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan,, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

TL;DR
This paper introduces Stable-SAM, a method that enhances the robustness of the Segment Anything Model (SAM) to low-quality prompts by calibrating its attention mechanism, resulting in more stable segmentation without altering the original model.
Contribution
The paper proposes a deformable sampling plugin and robust training strategy to improve SAM's segmentation stability across diverse prompt qualities, with minimal additional parameters.
Findings
Significantly improves segmentation stability with low-quality prompts
Retains SAM's promptability and efficiency
Requires only 0.08 million additional parameters
Abstract
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts which, however, often require good skills to specify. To make SAM robust to casual prompts, this paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities, notably imprecise bounding boxes and insufficient points. Our key finding reveals that given such low-quality prompts, SAM's mask decoder tends to activate image features that are biased towards the background or confined to specific object parts. To mitigate this issue, our key idea consists of calibrating solely SAM's mask attention by adjusting the sampling locations and amplitudes of image features, while the original SAM model architecture and weights remain unchanged. Consequently, our deformable sampling plugin (DSP) enables SAM to adaptively shift attention…
Peer Reviews
Decision·ICLR 2025 Poster
1. The topic this paper focuses on is interesting and important, that may be ignored in previous research. Though SAMs produce surprising segmentation results, they may fail when the input prompts are not that accurate, especially in real-life applications. 2. The proposed method is simple but effective. The promotion over previous SAM and high-quality SAM methods is notable on several benchmarks and tasks. 3. The writing is easy to follow.
1. One potential negative effect to discuss. Though this paper targets strengthening the ability of SAM to focus on the foreground object, will it tend to produce wrong segmentation results when the desired segmenting part is the background? It will be interesting to discuss the bias this method may introduce. 2. It is easy to demonstrate the improvement when the number of prompts reduces to 1 or 3 points. However, noisy boxes are usually determined by the user, which are more subjective. Given
1. The writing is clear, making the paper easy to follow. 2. The motivation to develop stable prompting for SAM is well-articulated and addresses a less-rexplored but valuable area. The empirical studies in Section 3 are solid and clearly presented. 3. The proposed DSP and DRP modules are novel, effectively enhancing SAM's stability without compromising its powerful pre-trained representations.The inclusion of a dynamic routing mechanism is practical and well-justified. 4. Stable-SAM demonstrate
1. The range of baseline models is limited. While many SAM-based models have recently been proposed to enhance its performance across various domains, this paper does not include comparisons with these relevant models. 2. The implementation of LoRA and adapters in SAM is not detailed. Since implementation choices can significantly affect performance, providing this information is crucial for evaluating the performance comparison as well as transparency and reproducibility. 3. The paper does not
1. This work provides new insights of SAM’s stability in segmentation and introduces a segmentation stability metric that effectively quantifies the stability. 2. The proposed DSP and DRP modules provide novel insights into the improvement of SAM. 3. Stable-SAM shows marked improvement with insufficient prompts. 4. Generally, the paper is easy to follow.
1. In Table 6, the interactive segmentation results for SAM on the SBD dataset appear inconsistent with those reported in other studies, showing higher performance. For example, in the works [1-4], the NoC90 metric for SAM-B/L/H exceeds 7.6, while Table 6 reports an SBD NoC90 of only 5.76. This discrepancy raises concerns given that this work builds upon SAM. 2. The core focus of this work is to address the issues arising from ambiguous prompts, yet SAM's practical application relies more on su
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Adversarial Robustness in Machine Learning
MethodsSegment Anything Model
