TL;DR
This paper introduces WiCoNet, a novel wide-context transformer-based neural network that enhances semantic segmentation of high-resolution remote sensing images by capturing long-range contextual information beyond traditional cropping limitations.
Contribution
The paper proposes WiCoNet with a dual-branch architecture and a Context Transformer to model large-scale context, overcoming CNN locality constraints for better land-cover classification.
Findings
WiCoNet outperforms existing methods on benchmark datasets.
The Context Transformer effectively integrates global context into local features.
The new Beijing Land-Use dataset provides a valuable resource for future research.
Abstract
Long-range contextual information is crucial for the semantic segmentation of High-Resolution (HR) Remote Sensing Images (RSIs). However, image cropping operations, commonly used for training neural networks, limit the perception of long-range contexts in large RSIs. To overcome this limitation, we propose a Wide-Context Network (WiCoNet) for the semantic segmentation of HR RSIs. Apart from extracting local features with a conventional CNN, the WiCoNet has an extra context branch to aggregate information from a larger image area. Moreover, we introduce a Context Transformer to embed contextual information from the context branch and selectively project it onto the local features. The Context Transformer extends the Vision Transformer, an emerging kind of neural network, to model the dual-branch semantic correlations. It overcomes the locality limitation of CNNs and enables the WiCoNet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Layer Normalization · Dropout · Label Smoothing · Residual Connection
