Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks
Michele Volpi, Devis Tuia

TL;DR
This paper introduces a CNN-based system for dense semantic labeling of ultra-high resolution aerial images, achieving state-of-the-art accuracy, improved geometric precision, and high efficiency by using a downsample-then-upsample architecture.
Contribution
The paper presents a novel CNN architecture with a downsample-then-upsample design for pixel-level land-cover classification in ultra-high resolution imagery, outperforming existing methods.
Findings
Achieves state-of-the-art accuracy on sub-decimeter datasets
Improves geometric accuracy of semantic labels
Offers high inference efficiency
Abstract
Semantic labeling (or pixel-level land-cover classification) in ultra-high resolution imagery (< 10cm) requires statistical models able to learn high level concepts from spatial data, with large appearance variations. Convolutional Neural Networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper we present a CNN-based system relying on an downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including i) state-of-the-art numerical accuracy, ii) improved geometric accuracy of predictions and iii) high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
