TL;DR
This paper introduces ScasNet, a novel deep CNN model for semantic labeling of very high resolution urban images, effectively handling confusing objects and intricate structures through global-to-local context aggregation and coarse-to-fine refinement.
Contribution
The paper proposes a self-cascaded CNN architecture with residual correction for improved semantic labeling accuracy in VHR images, outperforming existing methods.
Findings
Achieves state-of-the-art results on three public datasets.
Effectively handles confusing manmade objects with global-to-local context aggregation.
Improves fine-structured object labeling with a coarse-to-fine refinement strategy.
Abstract
Semantic labeling for very high resolution (VHR) images in urban areas, is of significant importance in a wide range of remote sensing applications. However, many confusing manmade objects and intricate fine-structured objects make it very difficult to obtain both coherent and accurate labeling results. For this challenging task, we propose a novel deep model with convolutional neural networks (CNNs), i.e., an end-to-end self-cascaded network (ScasNet). Specifically, for confusing manmade objects, ScasNet improves the labeling coherence with sequential global-to-local contexts aggregation. Technically, multi-scale contexts are captured on the output of a CNN encoder, and then they are successively aggregated in a self-cascaded manner. Meanwhile, for fine-structured objects, ScasNet boosts the labeling accuracy with a coarse-to-fine refinement strategy. It progressively refines the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
