Monocular Semantic Occupancy Grid Mapping with Convolutional Variational   Encoder-Decoder Networks

Chenyang Lu; Marinus Jacobus Gerardus van de Molengraft; Gijs; Dubbelman

arXiv:1804.02176·cs.RO·May 1, 2019

Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks

Chenyang Lu, Marinus Jacobus Gerardus van de Molengraft, Gijs, Dubbelman

PDF

TL;DR

This paper presents a variational encoder-decoder network for monocular semantic occupancy grid mapping that outperforms traditional methods, offers robustness to vehicle dynamics, and operates in real-time.

Contribution

It introduces an end-to-end deep learning approach using a variational encoder-decoder for semantic occupancy mapping from monocular images, improving accuracy and robustness.

Findings

01

Outperforms deterministic flat-plane assumption by over 12% in mean IoU.

02

Provides robustness against vehicle dynamic perturbations.

03

Achieves real-time inference at approximately 35 Hz.

Abstract

In this work, we research and evaluate end-to-end learning of monocular semantic-metric occupancy grid mapping from weak binocular ground truth. The network learns to predict four classes, as well as a camera to bird's eye view mapping. At the core, it utilizes a variational encoder-decoder network that encodes the front-view visual information of the driving scene and subsequently decodes it into a 2-D top-view Cartesian coordinate system. The evaluations on Cityscapes show that the end-to-end learning of semantic-metric occupancy grids outperforms the deterministic mapping approach with flat-plane assumption by more than 12% mean IoU. Furthermore, we show that the variational sampling with a relatively small embedding vector brings robustness against vehicle dynamic perturbations, and generalizability for unseen KITTI data. Our network achieves real-time inference rates of approx. 35…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.