UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer

Tianchen Deng; Xun Chen; Ziming Li; Hongming Shen; Danwei Wang; Javier Civera; Hesheng Wang

arXiv:2512.21078·cs.CV·December 30, 2025

UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer

Tianchen Deng, Xun Chen, Ziming Li, Hongming Shen, Danwei Wang, Javier Civera, Hesheng Wang

PDF

Open Access 1 Models

TL;DR

UniPR-3D introduces a novel multi-view visual place recognition architecture that leverages 3D and 2D features through a geometry-grounded transformer, significantly improving generalization and state-of-the-art performance.

Contribution

It is the first VPR method to effectively integrate multi-view 3D representations with dedicated feature aggregation modules for enhanced recognition.

Findings

01

Sets new state-of-the-art performance on VPR benchmarks.

02

Outperforms single- and multi-view baselines.

03

Demonstrates strong generalization across diverse environments.

Abstract

Visual Place Recognition (VPR) has been traditionally formulated as a single-image retrieval task. Using multiple views offers clear advantages, yet this setting remains relatively underexplored and existing methods often struggle to generalize across diverse environments. In this work we introduce UniPR-3D, the first VPR architecture that effectively integrates information from multiple views. UniPR-3D builds on a VGGT backbone capable of encoding multi-view 3D representations, which we adapt by designing feature aggregators and fine-tune for the place recognition task. To construct our descriptor, we jointly leverage the 3D tokens and intermediate 2D tokens produced by VGGT. Based on their distinct characteristics, we design dedicated aggregation modules for 2D and 3D features, allowing our descriptor to capture fine-grained texture cues while also reasoning across viewpoints. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
dtc111/UniPR-3D
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications