ClusVPR: Efficient Visual Place Recognition with Clustering-based   Weighted Transformer

Yifan Xu; Pourya Shamsolmoali; Jie Yang

arXiv:2310.04099·cs.CV·October 13, 2023

ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer

Yifan Xu, Pourya Shamsolmoali, Jie Yang

PDF

Open Access

TL;DR

ClusVPR introduces a clustering-based weighted transformer with an optimized VLAD layer and a pyramid self-supervised strategy, significantly improving visual place recognition accuracy and efficiency in complex scenes.

Contribution

The paper proposes a novel Clustering-based Weighted Transformer Network (CWTNet) and an optimized VLAD layer, enhancing VPR performance while reducing model complexity.

Findings

01

Outperforms existing models on four VPR datasets.

02

Achieves higher accuracy with fewer parameters.

03

Effectively handles duplicate regions and small objects.

Abstract

Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is particularly difficult due to the presence of duplicate regions and the lack of attention to small objects in complex scenes, resulting in recognition deviations. In this paper, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on Convolutional Neural Networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called Clustering-based Weighted Transformer Network (CWTNet). CWTNet leverages the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Byte Pair Encoding · Dense Connections · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization · Linear Layer · Multi-Head Attention