EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy

Yichun Yu; Yuqing Lan; Zhihuan Xing; Xiaoyi Yang; Tingyue Tang; Dan Yu

arXiv:2505.17665·cs.CV·May 26, 2025

EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy

Yichun Yu, Yuqing Lan, Zhihuan Xing, Xiaoyi Yang, Tingyue Tang, Dan Yu

PDF

TL;DR

This paper introduces RAPNet, a novel region-aware network that combines Transformer-based context modeling and global class refinement to improve multi-class segmentation of high-resolution remote sensing images, addressing local detail and global context challenges.

Contribution

The paper proposes RAPNet, a region-level segmentation framework with CRA and GCR modules, offering a new approach to enhance accuracy in remote sensing image segmentation.

Findings

01

RAPNet outperforms existing methods on three datasets.

02

Region-level modeling improves segmentation accuracy.

03

Transformer-based CRA captures long-range dependencies effectively.

Abstract

High-resolution remote sensing (HRRS) image segmentation is challenging due to complex spatial layouts and diverse object appearances. While CNNs excel at capturing local features, they struggle with long-range dependencies, whereas Transformers can model global context but often neglect local details and are computationally expensive.We propose a novel approach, Region-Aware Proxy Network (RAPNet), which consists of two components: Contextual Region Attention (CRA) and Global Class Refinement (GCR). Unlike traditional methods that rely on grid-based layouts, RAPNet operates at the region level for more flexible segmentation. The CRA module uses a Transformer to capture region-level contextual dependencies, generating a Semantic Region Mask (SRM). The GCR module learns a global class attention map to refine multi-class information, combining the SRM and attention map for accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.