SciceVPR: Stable Cross-Image Correlation Enhanced Model for Visual Place Recognition

Shanshan Wan; Yingmei Wei; Lai Kang; Tianrui Shen; Haixuan Wang; Yee-Hong Yang

arXiv:2502.20676·cs.CV·January 1, 2026

SciceVPR: Stable Cross-Image Correlation Enhanced Model for Visual Place Recognition

Shanshan Wan, Yingmei Wei, Lai Kang, Tianrui Shen, Haixuan Wang, Yee-Hong Yang

PDF

TL;DR

SciceVPR introduces a novel model that enhances global feature stability for visual place recognition by leveraging multi-layer feature fusion and cross-image correlation distillation, outperforming current state-of-the-art methods across various datasets.

Contribution

The paper proposes a stable cross-image correlation approach with multi-layer feature fusion and correlation distillation, improving robustness and accuracy in VPR tasks.

Findings

01

SciceVPR-B outperforms single-input SOTA methods on multiple datasets.

02

SciceVPR-L achieves comparable results to two-stage models, surpassing SOTA by over 3% in Recall@1.

03

The model maintains robustness under domain shifts like illumination and viewpoint changes.

Abstract

Visual Place Recognition (VPR) is a major challenge for robotics and autonomous systems, with the goal of predicting the location of an image based solely on its visual features. State-of-the-art (SOTA) models extract global descriptors using the powerful foundation model DINOv2 as backbone. These models either explore the cross-image correlation or propose a time-consuming two-stage re-ranking strategy to achieve better performance. However, existing works only utilize the final output of DINOv2, and the current cross-image correlation causes unstable retrieval results. To produce both discriminative and constant global descriptors, this paper proposes stable cross-image correlation enhanced model for VPR called SciceVPR. This model explores the full potential of DINOv2 in providing useful feature representations that implicitly encode valuable contextual knowledge. Specifically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.