TransVPR: Transformer-based place recognition with multi-level attention aggregation
Ruotong Wang, Yanqing Shen, Weiliang Zuo, Sanping Zhou, Nanning Zheng

TL;DR
TransVPR is a novel place recognition model using vision Transformers that effectively integrates multi-level attention to focus on task-relevant regions, achieving state-of-the-art results with efficient computation.
Contribution
It introduces a Transformer-based holistic place recognition approach that combines multi-level attention for improved accuracy and efficiency in complex scenes.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Maintains low computational time and storage.
Effectively filters task-relevant features using attention masks.
Abstract
Visual place recognition is a challenging task for applications such as autonomous driving navigation and mobile robot localization. Distracting elements presenting in complex scenes often lead to deviations in the perception of visual place. To address this problem, it is crucial to integrate information from only task-relevant regions into image representations. In this paper, we introduce a novel holistic place recognition model, TransVPR, based on vision Transformers. It benefits from the desirable property of the self-attention operation in Transformers which can naturally aggregate task-relevant features. Attentions from multiple levels of the Transformer, which focus on different regions of interest, are further combined to generate a global image representation. In addition, the output tokens from Transformer layers filtered by the fused attention mask are considered as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Indoor and Outdoor Localization Technologies
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Residual Connection · Softmax · Adam · Position-Wise Feed-Forward Layer · Dropout · Layer Normalization · Label Smoothing
