TL;DR
This paper introduces a multi-view CNN framework for large-area semantic labeling of satellite images, leveraging noisy OSM labels, resulting in improved accuracy and efficient training with minimal overhead.
Contribution
A novel multi-view training method and CNN architecture that enhances semantic segmentation accuracy using overlapping satellite images and noisy OSM labels.
Findings
4-7% improvement in per-class IoU scores over traditional methods
Discardable multi-view modifications at inference with minimal performance loss
Achieves state-of-the-art IoU scores of 0.8 for buildings and 0.64 for roads without human supervision
Abstract
We present a novel multi-view training framework and CNN architecture for combining information from multiple overlapping satellite images and noisy training labels derived from OpenStreetMap (OSM) to semantically label buildings and roads across large geographic regions (100 km). Our approach to multi-view semantic segmentation yields a 4-7% improvement in the per-class IoU scores compared to the traditional approaches that use the views independently of one another. A unique (and, perhaps, surprising) property of our system is that modifications that are added to the tail-end of the CNN for learning from the multi-view data can be discarded at the time of inference with a relatively small penalty in the overall performance. This implies that the benefits of training using multiple views are absorbed by all the layers of the network. Additionally, our approach only adds a small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
