ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer
Yifan Xu, Pourya Shamsolmoali, Jie Yang

TL;DR
ClusVPR introduces a clustering-based weighted transformer with an optimized VLAD layer and a pyramid self-supervised strategy, significantly improving visual place recognition accuracy and efficiency in complex scenes.
Contribution
The paper proposes a novel Clustering-based Weighted Transformer Network (CWTNet) and an optimized VLAD layer, enhancing VPR performance while reducing model complexity.
Findings
Outperforms existing models on four VPR datasets.
Achieves higher accuracy with fewer parameters.
Effectively handles duplicate regions and small objects.
Abstract
Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is particularly difficult due to the presence of duplicate regions and the lack of attention to small objects in complex scenes, resulting in recognition deviations. In this paper, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on Convolutional Neural Networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called Clustering-based Weighted Transformer Network (CWTNet). CWTNet leverages the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Byte Pair Encoding · Dense Connections · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization · Linear Layer · Multi-Head Attention
