MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics
Alexander Melekhin, Dmitry Yudin, Ilia Petryashin, Vitaly Bezuglyj

TL;DR
MSSPlace introduces a multimodal approach combining images, LiDAR, semantics, and text for improved place recognition in autonomous navigation, demonstrating state-of-the-art results on benchmark datasets.
Contribution
The paper presents MSSPlace, a novel multi-sensor framework that integrates visual, LiDAR, semantic, and textual data for enhanced place recognition performance.
Findings
Combining multiple sensor modalities improves recognition accuracy.
Separate visual and textual semantics can achieve promising results.
Multi-sensor fusion outperforms single modality approaches.
Abstract
Place recognition is a challenging task in computer vision, crucial for enabling autonomous vehicles and robots to navigate previously visited environments. While significant progress has been made in learnable multimodal methods that combine onboard camera images and LiDAR point clouds, the full potential of these methods remains largely unexplored in localization applications. In this paper, we study the impact of leveraging a multi-camera setup and integrating diverse data sources for multimodal place recognition, incorporating explicit visual semantics and text descriptions. Our proposed method named MSSPlace utilizes images from multiple cameras, LiDAR point clouds, semantic segmentation masks, and text annotations to generate comprehensive place descriptors. We employ a late fusion approach to integrate these modalities, providing a unified representation. Through extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Geographic Information Systems Studies · Multimodal Machine Learning Applications
