LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Nikhil Gosala; K\"ursat Petek; B Ravi Kiran; Senthil Yogamani; Paulo; Drews-Jr; Wolfram Burgard; Abhinav Valada

arXiv:2405.18852·cs.CV·May 30, 2024

LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Nikhil Gosala, K\"ursat Petek, B Ravi Kiran, Senthil Yogamani, Paulo, Drews-Jr, Wolfram Burgard, Abhinav Valada

PDF

Open Access

TL;DR

This paper introduces the first unsupervised method for semantic BEV mapping from monocular images, reducing the need for extensive labeled data by leveraging spatial-temporal consistency and a novel autoencoder.

Contribution

It proposes an unsupervised pretraining approach that independently reasons about scene geometry and semantics, enabling label-efficient semantic BEV map generation.

Findings

01

Achieves state-of-the-art performance with only 1% of BEV labels.

02

Uses spatial-temporal consistency for label-free pretraining.

03

No additional labeled data required for effective BEV mapping.

Abstract

Semantic Bird's Eye View (BEV) maps offer a rich representation with strong occlusion reasoning for various decision making tasks in autonomous driving. However, most BEV mapping approaches employ a fully supervised learning paradigm that relies on large amounts of human-annotated BEV ground truth data. In this work, we address this limitation by proposing the first unsupervised representation learning approach to generate semantic BEV maps from a monocular frontal view (FV) image in a label-efficient manner. Our approach pretrains the network to independently reason about scene geometry and scene semantics using two disjoint neural pathways in an unsupervised manner and then finetunes it for the task of semantic BEV mapping using only a small fraction of labels in the BEV. We achieve label-free pretraining by exploiting spatial and temporal consistency of FV images to learn scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques