FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang,, Guannan Jiang, Rongrong Ji

TL;DR
FocSAM enhances the Segment Anything Model by introducing dynamic attention and pixel-wise activation to improve stability and efficiency in interactive segmentation, especially on challenging samples.
Contribution
FocSAM redesigns SAM's pipeline with Dwin-MSA and P-DyReLU to address stability issues and improve interactive segmentation performance with minimal computational overhead.
Findings
Achieves segmentation quality comparable to state-of-the-art methods.
Requires only 5.6% of the inference time of existing methods on CPUs.
Effectively refocuses on target objects during interaction.
Abstract
The Segment Anything Model (SAM) marks a notable milestone in segmentation models, highlighted by its robust zero-shot capabilities and ability to handle diverse prompts. SAM follows a pipeline that separates interactive segmentation into image preprocessing through a large encoder and interactive inference via a lightweight decoder, ensuring efficient real-time performance. However, SAM faces stability issues in challenging samples upon this pipeline. These issues arise from two main factors. Firstly, the image preprocessing disables SAM from dynamically using image-level zoom-in strategies to refocus on the target object during interaction. Secondly, the lightweight decoder struggles to sufficiently integrate interactive information with image embeddings. To address these two limitations, we propose FocSAM with a pipeline redesigned on two pivotal aspects. First, we propose Dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications
MethodsSegment Anything Model
