DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map
Peng Wang, Ruigang Yang, Binbin Cao, Wei Xu, Yuanqing Lin

TL;DR
DeLS-3D presents a unified sensor fusion framework combining camera videos, GPS/IMU, and 3D semantic maps to improve autonomous scene parsing and localization accuracy.
Contribution
The paper introduces a novel integrated system that jointly performs localization and scene segmentation using multi-sensor data and deep learning, enhancing robustness over existing methods.
Findings
Sensor fusion improves pose estimation accuracy.
Joint localization and segmentation mutually benefit each other.
The system outperforms image-only methods like PoseNet.
Abstract
For applications such as autonomous driving, self-localization/camera pose estimation and scene parsing are crucial technologies. In this paper, we propose a unified framework to tackle these two problems simultaneously. The uniqueness of our design is a sensor fusion scheme which integrates camera videos, motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robustness and efficiency of the system. Specifically, we first have an initial coarse camera pose obtained from consumer-grade GPS/IMU, based on which a label map can be rendered from the 3D semantic map. Then, the rendered label map and the RGB image are jointly fed into a pose CNN, yielding a corrected camera pose. In addition, to incorporate temporal information, a multi-layer recurrent neural network (RNN) is further deployed improve the pose accuracy. Finally, based on the pose from RNN, we render a new label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Advanced Vision and Imaging
