SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural   Networks

John McCormac; Ankur Handa; Andrew Davison; Stefan Leutenegger

arXiv:1609.05130·cs.CV·September 29, 2016·44 cites

SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks

John McCormac, Ankur Handa, Andrew Davison, Stefan Leutenegger

PDF

Open Access

TL;DR

SemanticFusion combines CNN-based semantic predictions with dense SLAM to produce real-time, detailed 3D semantic maps, improving accuracy over single-frame methods for indoor RGB-D video.

Contribution

It introduces a system that fuses CNN semantic predictions with dense SLAM, enabling real-time 3D semantic mapping with improved accuracy.

Findings

01

Fusing multiple views improves semantic labeling accuracy.

02

System operates at approximately 25Hz for real-time use.

03

Enhanced 2D semantic segmentation performance over single-frame predictions.

Abstract

Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need extend beyond geometry and appearence - they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN's semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques