VGGT-MPR: VGGT-Enhanced Multimodal Place Recognition in Autonomous Driving Environments

Jingyi Xu; Zhangshuo Qi; Zhongmiao Yan; Xuyu Gao; Qianyun Jiao; Songpengcheng Xia; Xieyuanli Chen; and Ling Pei

arXiv:2602.19735·cs.CV·February 24, 2026

VGGT-MPR: VGGT-Enhanced Multimodal Place Recognition in Autonomous Driving Environments

Jingyi Xu, Zhangshuo Qi, Zhongmiao Yan, Xuyu Gao, Qianyun Jiao, Songpengcheng Xia, Xieyuanli Chen, and Ling Pei

PDF

Open Access

TL;DR

VGGT-MPR introduces a unified transformer-based framework for multimodal place recognition in autonomous driving, improving robustness and accuracy through geometric-aware features and a novel re-ranking method.

Contribution

The paper presents VGGT-MPR, a novel multimodal place recognition framework using a transformer for geometric feature extraction and a training-free re-ranking mechanism, advancing robustness and efficiency.

Findings

01

Achieves state-of-the-art performance on large-scale benchmarks.

02

Demonstrates robustness to environmental changes and occlusions.

03

Outperforms existing methods in accuracy and computational efficiency.

Abstract

In autonomous driving, robust place recognition is critical for global localization and loop closure detection. While inter-modality fusion of camera and LiDAR data in multimodal place recognition (MPR) has shown promise in overcoming the limitations of unimodal counterparts, existing MPR methods basically attend to hand-crafted fusion strategies and heavily parameterized backbones that require costly retraining. To address this, we propose VGGT-MPR, a multimodal place recognition framework that adopts the Visual Geometry Grounded Transformer (VGGT) as a unified geometric engine for both global retrieval and re-ranking. In the global retrieval stage, VGGT extracts geometrically-rich visual embeddings through prior depth-aware and point map supervision, and densifies sparse LiDAR point clouds with predicted depth maps to improve structural representation. This enhances the discriminative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications