RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation   Models

Keyan Chen; Jiafan Zhang; Chenyang Liu; Zhengxia Zou; Zhenwei Shi

arXiv:2501.06809·cs.CV·January 14, 2025

RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models

Keyan Chen, Jiafan Zhang, Chenyang Liu, Zhengxia Zou, Zhenwei Shi

PDF

1 Repo

TL;DR

RSRefSeg introduces a foundation model for referring remote sensing image segmentation that combines CLIP and SAM to improve fine-grained visual understanding and segmentation accuracy in remote sensing applications.

Contribution

It presents a novel framework that leverages CLIP and SAM for better multimodal alignment and segmentation in remote sensing images, addressing limitations of previous methods.

Findings

01

Outperforms existing methods on RRSIS-D dataset

02

Effectively aligns fine-grained semantic concepts across modalities

03

Enhances segmentation accuracy in remote sensing images

Abstract

Referring remote sensing image segmentation is crucial for achieving fine-grained visual understanding through free-format textual input, enabling enhanced scene and object extraction in remote sensing applications. Current research primarily utilizes pre-trained language models to encode textual descriptions and align them with visual modalities, thereby facilitating the expression of relevant visual features. However, these approaches often struggle to establish robust alignments between fine-grained semantic concepts, leading to inconsistent representations across textual and visual information. To address these limitations, we introduce a referring remote sensing image segmentation foundational model, RSRefSeg. RSRefSeg leverages CLIP for visual and textual encoding, employing both global and local textual semantics as filters to generate referring-related visual activation features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kyanchen/rsrefseg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN · Contrastive Language-Image Pre-training · Segment Anything Model