TransVPR: Transformer-based place recognition with multi-level attention   aggregation

Ruotong Wang; Yanqing Shen; Weiliang Zuo; Sanping Zhou; Nanning Zheng

arXiv:2201.02001·cs.CV·April 14, 2022·5 cites

TransVPR: Transformer-based place recognition with multi-level attention aggregation

Ruotong Wang, Yanqing Shen, Weiliang Zuo, Sanping Zhou, Nanning Zheng

PDF

Open Access

TL;DR

TransVPR is a novel place recognition model using vision Transformers that effectively integrates multi-level attention to focus on task-relevant regions, achieving state-of-the-art results with efficient computation.

Contribution

It introduces a Transformer-based holistic place recognition approach that combines multi-level attention for improved accuracy and efficiency in complex scenes.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Maintains low computational time and storage.

03

Effectively filters task-relevant features using attention masks.

Abstract

Visual place recognition is a challenging task for applications such as autonomous driving navigation and mobile robot localization. Distracting elements presenting in complex scenes often lead to deviations in the perception of visual place. To address this problem, it is crucial to integrate information from only task-relevant regions into image representations. In this paper, we introduce a novel holistic place recognition model, TransVPR, based on vision Transformers. It benefits from the desirable property of the self-attention operation in Transformers which can naturally aggregate task-relevant features. Attentions from multiple levels of the Transformer, which focus on different regions of interest, are further combined to generate a global image representation. In addition, the output tokens from Transformer layers filtered by the fused attention mask are considered as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Indoor and Outdoor Localization Technologies

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Residual Connection · Softmax · Adam · Position-Wise Feed-Forward Layer · Dropout · Layer Normalization · Label Smoothing