BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV   Alignment

Mehdi Hosseinzadeh; Ian Reid

arXiv:2410.20969·cs.RO·October 29, 2024

BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment

Mehdi Hosseinzadeh, Ian Reid

PDF

Open Access

TL;DR

BEVPose introduces a pose-guided multi-modal fusion framework for BEV scene understanding that reduces reliance on extensive annotations, achieving superior segmentation performance with minimal labeled data.

Contribution

This work presents BEVPose, a novel pose-guided fusion method that improves BEV map learning efficiency and accuracy with limited annotated data, extending applicability beyond urban settings.

Findings

01

Outperforms fully-supervised methods in BEV segmentation tasks

02

Requires significantly less annotated data for effective learning

03

Effectively fuses lidar and camera data using pose information

Abstract

In the field of autonomous driving and mobile robotics, there has been a significant shift in the methods used to create Bird's Eye View (BEV) representations. This shift is characterised by using transformers and learning to fuse measurements from disparate vision sensors, mainly lidar and cameras, into a 2D planar ground-based representation. However, these learning-based methods for creating such maps often rely heavily on extensive annotated data, presenting notable challenges, particularly in diverse or non-urban environments where large-scale datasets are scarce. In this work, we present BEVPose, a framework that integrates BEV representations from camera and lidar data, using sensor pose as a guiding supervisory signal. This method notably reduces the dependence on costly annotated data. By leveraging pose information, we align and fuse multi-modal sensory inputs, facilitating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Artificial Intelligence in Games

MethodsALIGN