SPDA-SAM: A Self-prompted Depth-Aware Segment Anything Model for Instance Segmentation
Yihan Shang, Wei Wang, Chao Huang, Xinghui Dong

TL;DR
This paper introduces SPDA-SAM, a novel instance segmentation model that enhances SAM with self-generated prompts and depth information fusion, significantly improving performance across multiple datasets.
Contribution
The paper proposes a self-prompted, depth-aware extension of SAM with novel modules for semantic-spatial prompting and RGB-D feature fusion, addressing limitations of manual prompts and lack of depth data.
Findings
Outperforms state-of-the-art methods on twelve datasets.
Effective use of depth maps improves spatial understanding.
Self-prompting reduces dependency on manual prompts.
Abstract
Recently, Segment Anything Model (SAM) has demonstrated strong generalizability in various instance segmentation tasks. However, its performance is severely dependent on the quality of manual prompts. In addition, the RGB images that instance segmentation methods normally use inherently lack depth information. As a result, the ability of these methods to perceive spatial structures and delineate object boundaries is hindered. To address these challenges, we propose a Self-prompted Depth-Aware SAM (SPDA-SAM) for instance segmentation. Specifically, we design a Semantic-Spatial Self-prompt Module (SSSPM) which extracts the semantic and spatial prompts from the image encoder and the mask decoder of SAM, respectively. Furthermore, we introduce a Coarse-to-Fine RGB-D Fusion Module (C2FFM), in which the features extracted from a monocular RGB image and the depth map estimated from it are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Domain Adaptation and Few-Shot Learning
