BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment
Mehdi Hosseinzadeh, Ian Reid

TL;DR
BEVPose introduces a pose-guided multi-modal fusion framework for BEV scene understanding that reduces reliance on extensive annotations, achieving superior segmentation performance with minimal labeled data.
Contribution
This work presents BEVPose, a novel pose-guided fusion method that improves BEV map learning efficiency and accuracy with limited annotated data, extending applicability beyond urban settings.
Findings
Outperforms fully-supervised methods in BEV segmentation tasks
Requires significantly less annotated data for effective learning
Effectively fuses lidar and camera data using pose information
Abstract
In the field of autonomous driving and mobile robotics, there has been a significant shift in the methods used to create Bird's Eye View (BEV) representations. This shift is characterised by using transformers and learning to fuse measurements from disparate vision sensors, mainly lidar and cameras, into a 2D planar ground-based representation. However, these learning-based methods for creating such maps often rely heavily on extensive annotated data, presenting notable challenges, particularly in diverse or non-urban environments where large-scale datasets are scarce. In this work, we present BEVPose, a framework that integrates BEV representations from camera and lidar data, using sensor pose as a guiding supervisory signal. This method notably reduces the dependence on costly annotated data. By leveraging pose information, we align and fuse multi-modal sensory inputs, facilitating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Artificial Intelligence in Games
MethodsALIGN
