RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models

Keyan Chen; Chenyang Liu; Bowen Chen; Jiafan Zhang; Zhengxia Zou; Zhenwei Shi

arXiv:2507.06231·cs.CV·March 19, 2026

RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models

Keyan Chen, Chenyang Liu, Bowen Chen, Jiafan Zhang, Zhengxia Zou, Zhenwei Shi

PDF

1 Repo

TL;DR

RSRefSeg 2 introduces a decoupled, two-stage framework combining CLIP and SAM models for improved referring remote sensing image segmentation, enhancing accuracy and semantic understanding.

Contribution

It proposes a novel decoupling paradigm that separates localization and segmentation, integrating foundation models for better cross-modal alignment and interpretability.

Findings

01

Outperforms existing methods by approximately 3% gIoU in segmentation accuracy.

02

Effectively handles complex semantic relationships in remote sensing images.

03

Demonstrates superior generalizability and interpretability in experiments.

Abstract

Referring Remote Sensing Image Segmentation provides a flexible and fine-grained framework for remote sensing scene analysis via vision-language collaborative interpretation. Current approaches predominantly utilize a three-stage pipeline encompassing dual-modal encoding, cross-modal interaction, and pixel decoding. These methods demonstrate significant limitations in managing complex semantic relationships and achieving precise cross-modal alignment, largely due to their coupled processing mechanism that conflates target localization with boundary delineation. This architectural coupling amplifies error propagation under semantic ambiguity while restricting model generalizability and interpretability. To address these issues, we propose RSRefSeg 2, a decoupling paradigm that reformulates the conventional workflow into a collaborative dual-stage framework: coarse localization followed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kyanchen/rsrefseg2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSegment Anything Model · Contrastive Language-Image Pre-training