TL;DR
This paper introduces a multi-modal place recognition system that uses spherical CNNs to match images and LiDAR data across different sensor setups, improving robustness and accuracy in heterogeneous environments.
Contribution
It presents a novel end-to-end pipeline that operates directly on sensor data projected onto a sphere, supporting arbitrary sensor configurations without local feature extraction.
Findings
Achieves up to 10% higher recall than state-of-the-art LiDAR-based methods.
Achieves up to 5% higher recall than vision-based methods.
Correctly identifies up to 95% of matching places.
Abstract
In this paper, we propose a robust end-to-end multi-modal pipeline for place recognition where the sensor systems can differ from the map building to the query. Our approach operates directly on images and LiDAR scans without requiring any local feature extraction modules. By projecting the sensor data onto the unit sphere, we learn a multi-modal descriptor of partially overlapping scenes using a spherical convolutional neural network. The employed spherical projection model enables the support of arbitrary LiDAR and camera systems readily without losing information. Loop closure candidates are found using a nearest-neighbor lookup in the embedding space. We tackle the problem of correctly identifying the closest place by correlating the candidates' power spectra, obtaining a confidence value per prospect. Our estimate for the correct place corresponds then to the candidate with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
