Semantic Localization Guiding Segment Anything Model For Reference Remote Sensing Image Segmentation
Shuyang Li, Shuang Wang, Zhuangzhuang Sun, Jing Xiao

TL;DR
This paper introduces PSLG-SAM, a two-stage framework for remote sensing image segmentation guided by semantic localization, reducing annotation needs and improving accuracy by decomposing the task into localization and segmentation.
Contribution
The paper proposes a novel two-stage RRSIS framework that leverages a train-free segmentation stage and a new dataset, enhancing performance and reducing annotation requirements.
Findings
Significant performance improvements over existing models.
The second stage is train-free, reducing annotation burden.
Effective handling of complex scenes in remote sensing images.
Abstract
The Reference Remote Sensing Image Segmentation (RRSIS) task generates segmentation masks for specified objects in images based on textual descriptions, which has attracted widespread attention and research interest. Current RRSIS methods rely on multi-modal fusion backbones and semantic segmentation heads but face challenges like dense annotation requirements and complex scene interpretation. To address these issues, we propose a framework named \textit{prompt-generated semantic localization guiding Segment Anything Model}(PSLG-SAM), which decomposes the RRSIS task into two stages: coarse localization and fine segmentation. In coarse localization stage, a visual grounding network roughly locates the text-described object. In fine segmentation stage, the coordinates from the first stage guide the Segment Anything Model (SAM), enhanced by a clustering-based foreground point generator and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Remote-Sensing Image Classification · Automated Road and Building Extraction
MethodsSoftmax · Attention Is All You Need
