Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation
Zhe Dong, Yuzhe Sun, Tianzhu Liu, Wangmeng Zuo, Yanfeng Gu

TL;DR
This paper introduces CroBIM, a novel framework for referring remote sensing image segmentation that effectively integrates spatial and linguistic information through bidirectional cross-modal interactions, achieving superior results on new and existing datasets.
Contribution
The paper proposes CroBIM, a new cross-modal interaction model with modules for context-aware prompt modulation, language-guided feature aggregation, and a mutual-interaction decoder, along with a large-scale benchmark dataset RISBench.
Findings
CroBIM outperforms existing methods on RISBench and other datasets.
The RISBench dataset contains 52,472 image-language-label triplets.
The proposed modules enhance cross-modal feature alignment and segmentation accuracy.
Abstract
Given a natural language expression and a remote sensing image, the goal of referring remote sensing image segmentation (RRSIS) is to generate a pixel-level mask of the target object identified by the referring expression. In contrast to natural scenarios, expressions in RRSIS often involve complex geospatial relationships, with target objects of interest that vary significantly in scale and lack visual saliency, thereby increasing the difficulty of achieving precise segmentation. To address the aforementioned challenges, a novel RRSIS framework is proposed, termed the cross-modal bidirectional interaction model (CroBIM). Specifically, a context-aware prompt modulation (CAPM) module is designed to integrate spatial positional relationships and task-specific knowledge into the linguistic features, thereby enhancing the ability to capture the target object. Additionally, a language-guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Remote-Sensing Image Classification
MethodsSoftmax · Attention Is All You Need
