Stable Segment Anything Model

Qi Fan; Xin Tao; Lei Ke; Mingqiao Ye; Yuan Zhang; Pengfei Wan,; Zhongyuan Wang; Yu-Wing Tai; Chi-Keung Tang

arXiv:2311.15776·cs.CV·December 6, 2023·1 cites

Stable Segment Anything Model

Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan,, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Stable-SAM, a method that enhances the robustness of the Segment Anything Model (SAM) to low-quality prompts by calibrating its attention mechanism, resulting in more stable segmentation without altering the original model.

Contribution

The paper proposes a deformable sampling plugin and robust training strategy to improve SAM's segmentation stability across diverse prompt qualities, with minimal additional parameters.

Findings

01

Significantly improves segmentation stability with low-quality prompts

02

Retains SAM's promptability and efficiency

03

Requires only 0.08 million additional parameters

Abstract

The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts which, however, often require good skills to specify. To make SAM robust to casual prompts, this paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities, notably imprecise bounding boxes and insufficient points. Our key finding reveals that given such low-quality prompts, SAM's mask decoder tends to activate image features that are biased towards the background or confined to specific object parts. To mitigate this issue, our key idea consists of calibrating solely SAM's mask attention by adjusting the sampling locations and amplitudes of image features, while the original SAM model architecture and weights remain unchanged. Consequently, our deformable sampling plugin (DSP) enables SAM to adaptively shift attention…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. The topic this paper focuses on is interesting and important, that may be ignored in previous research. Though SAMs produce surprising segmentation results, they may fail when the input prompts are not that accurate, especially in real-life applications. 2. The proposed method is simple but effective. The promotion over previous SAM and high-quality SAM methods is notable on several benchmarks and tasks. 3. The writing is easy to follow.

Weaknesses

1. One potential negative effect to discuss. Though this paper targets strengthening the ability of SAM to focus on the foreground object, will it tend to produce wrong segmentation results when the desired segmenting part is the background? It will be interesting to discuss the bias this method may introduce. 2. It is easy to demonstrate the improvement when the number of prompts reduces to 1 or 3 points. However, noisy boxes are usually determined by the user, which are more subjective. Given

Reviewer 02Rating 8Confidence 4

Strengths

1. The writing is clear, making the paper easy to follow. 2. The motivation to develop stable prompting for SAM is well-articulated and addresses a less-rexplored but valuable area. The empirical studies in Section 3 are solid and clearly presented. 3. The proposed DSP and DRP modules are novel, effectively enhancing SAM's stability without compromising its powerful pre-trained representations.The inclusion of a dynamic routing mechanism is practical and well-justified. 4. Stable-SAM demonstrate

Weaknesses

1. The range of baseline models is limited. While many SAM-based models have recently been proposed to enhance its performance across various domains, this paper does not include comparisons with these relevant models. 2. The implementation of LoRA and adapters in SAM is not detailed. Since implementation choices can significantly affect performance, providing this information is crucial for evaluating the performance comparison as well as transparency and reproducibility. 3. The paper does not

Reviewer 03Rating 6Confidence 4

Strengths

1. This work provides new insights of SAM’s stability in segmentation and introduces a segmentation stability metric that effectively quantifies the stability. 2. The proposed DSP and DRP modules provide novel insights into the improvement of SAM. 3. Stable-SAM shows marked improvement with insufficient prompts. 4. Generally, the paper is easy to follow.

Weaknesses

1. In Table 6, the interactive segmentation results for SAM on the SBD dataset appear inconsistent with those reported in other studies, showing higher performance. For example, in the works [1-4], the NoC90 metric for SAM-B/L/H exceeds 7.6, while Table 6 reports an SBD NoC90 of only 5.76. This discrepancy raises concerns given that this work builds upon SAM. 2. The core focus of this work is to address the issues arising from ambiguous prompts, yet SAM's practical application relies more on su

Code & Models

Repositories

fanq15/stable-sam
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Adversarial Robustness in Machine Learning

MethodsSegment Anything Model