DC-VLAQ: Query-Residual Aggregation for Robust Visual Place Recognition

Hanyu Zhu; Zhihao Zhan; Yuhang Ming; Liang Li; Dibo Hou; Javier Civera; Wanzeng Kong

arXiv:2601.12729·cs.CV·January 21, 2026

DC-VLAQ: Query-Residual Aggregation for Robust Visual Place Recognition

Hanyu Zhu, Zhihao Zhan, Yuhang Ming, Liang Li, Dibo Hou, Javier Civera, Wanzeng Kong

PDF

Open Access

TL;DR

DC-VLAQ introduces a novel fusion and global aggregation framework for visual place recognition, enhancing robustness against viewpoint, illumination, and domain shifts by leveraging complementary features and residual query schemes.

Contribution

The paper presents DC-VLAQ, a new representation-centric approach combining residual-guided fusion of VFMs with a residual global aggregation scheme for improved VPR performance.

Findings

01

Outperforms existing methods on multiple benchmarks

02

Achieves state-of-the-art results under domain shifts

03

Demonstrates robustness to appearance changes

Abstract

One of the central challenges in visual place recognition (VPR) is learning a robust global representation that remains discriminative under large viewpoint changes, illumination variations, and severe domain shifts. While visual foundation models (VFMs) provide strong local features, most existing methods rely on a single model, overlooking the complementary cues offered by different VFMs. However, exploiting such complementary information inevitably alters token distributions, which challenges the stability of existing query-based global aggregation schemes. To address these challenges, we propose DC-VLAQ, a representation-centric framework that integrates the fusion of complementary VFMs and robust global aggregation. Specifically, we first introduce a lightweight residual-guided complementary fusion that anchors representations in the DINOv2 feature space while injecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications