Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition
Junqiao Zhao, Fenglin Zhang, Yingfeng Cai, Gengxuan Tian, Wenjie Mu,, Chen Ye, Tiantian Feng

TL;DR
This paper introduces a novel spatio-temporal attention-based sequence descriptor for visual place recognition, improving robustness by capturing intrinsic dynamics in frame sequences and outperforming existing methods on benchmark datasets.
Contribution
It proposes a new sequence descriptor that integrates spatial and temporal attention with relative positional encoding for enhanced VPR performance.
Findings
Outperforms recent state-of-the-art methods on benchmark datasets
Effectively captures spatio-temporal dynamics in frame sequences
Utilizes a sliding window and relative positional encoding for sequence modeling
Abstract
Visual Place Recognition (VPR) aims to retrieve frames from a geotagged database that are located at the same place as the query frame. To improve the robustness of VPR in perceptually aliasing scenarios, sequence-based VPR methods are proposed. These methods are either based on matching between frame sequences or extracting sequence descriptors for direct retrieval. However, the former is usually based on the assumption of constant velocity, which is difficult to hold in practice, and is computationally expensive and subject to sequence length. Although the latter overcomes these problems, existing sequence descriptors are constructed by aggregating features of multiple frames only, without interaction on temporal information, and thus cannot obtain descriptors with spatio-temporal discrimination.In this paper, we propose a sequence descriptor that effectively incorporates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
