Referring Remote Sensing Image Segmentation with Cross-view Semantics Interaction Network

Jiaxing Yang; Lihe Zhang; Huchuan Lu

arXiv:2508.01331·cs.CV·August 5, 2025

Referring Remote Sensing Image Segmentation with Cross-view Semantics Interaction Network

Jiaxing Yang, Lihe Zhang, Huchuan Lu

PDF

Open Access

TL;DR

This paper introduces CSINet, a novel unified framework for remote sensing image segmentation that leverages cross-view semantics interaction to improve accuracy on targets of varying scales, especially tiny or ambiguous ones.

Contribution

The paper proposes a new parallel, unified segmentation network with cross-view semantics interaction, including CVWin modules and a CDAD decoder, to better handle scale variation in remote sensing images.

Findings

01

Significant performance improvements over existing methods.

02

Effective handling of tiny and ambiguous targets.

03

Maintains high speed while enhancing global and local semantics.

Abstract

Recently, Referring Remote Sensing Image Segmentation (RRSIS) has aroused wide attention. To handle drastic scale variation of remote targets, existing methods only use the full image as input and nest the saliency-preferring techniques of cross-scale information interaction into traditional single-view structure. Although effective for visually salient targets, they still struggle in handling tiny, ambiguous ones in lots of real scenarios. In this work, we instead propose a paralleled yet unified segmentation framework Cross-view Semantics Interaction Network (CSINet) to solve the limitations. Motivated by human behavior in observing targets of interest, the network orchestrates visual cues from remote and close distances to conduct synergistic prediction. In its every encoding stage, a Cross-View Window-attention module (CVWin) is utilized to supplement global and local semantics into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Multimodal Machine Learning Applications