Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning
Zichao Zeng, June Moh Goo, Junwei Zheng, Weijia Fan, Jiaming Zhang, Rainer Stiefelhagen, Jan Boehm

TL;DR
This paper introduces WeiAD and WeiToP, novel methods for improving visual place recognition by weighted feature aggregation and token pruning, enhancing accuracy and efficiency for large-scale and resource-limited applications.
Contribution
The paper proposes WeiAD for discriminative feature aggregation and WeiToP for efficient token pruning, addressing limitations of uniform pooling and high computational costs in ViT-based VPR.
Findings
WeiAD improves retrieval accuracy by weighting cluster contributions.
WeiToP reduces feature extraction cost with minimal accuracy loss.
The combined approach outperforms existing methods in accuracy and efficiency.
Abstract
Visual Place Recognition (VPR) aims to match a query image to reference images of the same place in a large-scale database. Recent state-of-the-art methods employ Vision Transformers (ViTs) as backbone foundation models to extract patch-level features that are robust to viewpoint, illumination, and seasonal variations, which are then aggregated into a compact global descriptor for retrieval. Most existing aggregation methods uniformly pool patch tokens into learned clusters, despite the fact that different clusters often encode distinct spatial or semantic patterns and contribute unequally to VPR performance. To address this limitation, we propose Weighted Aggregated Descriptor (WeiAD), which assigns weights to clusters during aggregation, producing more discriminative global representations. Beyond accuracy, retrieval latency is a critical concern for large-scale deployments and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
