MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics

Alexander Melekhin; Dmitry Yudin; Ilia Petryashin; Vitaly Bezuglyj

arXiv:2407.15663·cs.CV·March 3, 2026

MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics

Alexander Melekhin, Dmitry Yudin, Ilia Petryashin, Vitaly Bezuglyj

PDF

Open Access 1 Repo

TL;DR

MSSPlace introduces a multimodal approach combining images, LiDAR, semantics, and text for improved place recognition in autonomous navigation, demonstrating state-of-the-art results on benchmark datasets.

Contribution

The paper presents MSSPlace, a novel multi-sensor framework that integrates visual, LiDAR, semantic, and textual data for enhanced place recognition performance.

Findings

01

Combining multiple sensor modalities improves recognition accuracy.

02

Separate visual and textual semantics can achieve promising results.

03

Multi-sensor fusion outperforms single modality approaches.

Abstract

Place recognition is a challenging task in computer vision, crucial for enabling autonomous vehicles and robots to navigate previously visited environments. While significant progress has been made in learnable multimodal methods that combine onboard camera images and LiDAR point clouds, the full potential of these methods remains largely unexplored in localization applications. In this paper, we study the impact of leveraging a multi-camera setup and integrating diverse data sources for multimodal place recognition, incorporating explicit visual semantics and text descriptions. Our proposed method named MSSPlace utilizes images from multiple cameras, LiDAR point clouds, semantic segmentation masks, and text annotations to generate comprehensive place descriptors. We employ a late fusion approach to integrate these modalities, providing a unified representation. Through extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexmelekhin/mssplace
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Geographic Information Systems Studies · Multimodal Machine Learning Applications