Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State   Space Model and Multi-View Matching

Gongxin Yao; Xinyang Li; Luowei Fu; Yu Pan

arXiv:2410.06285·cs.CV·October 10, 2024

Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching

Gongxin Yao, Xinyang Li, Luowei Fu, Yu Pan

PDF

Open Access

TL;DR

This paper presents a novel cross-modal descriptor learning framework for monocular camera localization within LiDAR maps, leveraging multi-view matching and contrastive learning to improve place recognition accuracy.

Contribution

It introduces a new multi-view, cross-modal descriptor learning approach using a visual state space model and contrastive training for improved LiDAR map-based localization.

Findings

01

Effective in KITTI datasets

02

Generalizes well across different scenes

03

Reduces computational overhead compared to SLAM

Abstract

Achieving monocular camera localization within pre-built LiDAR maps can bypass the simultaneous mapping process of visual SLAM systems, potentially reducing the computational overhead of autonomous localization. To this end, one of the key challenges is cross-modal place recognition, which involves retrieving 3D scenes (point clouds) from a LiDAR map according to online RGB images. In this paper, we introduce an efficient framework to learn descriptors for both RGB images and point clouds. It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy for cross-modal contrastive learning. To address the field-of-view differences, independent descriptors are generated from multiple evenly distributed viewpoints for point clouds. A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote Sensing and Land Use · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques