Learning Sequential Descriptors for Sequence-based Visual Place Recognition
Riccardo Mereu, Gabriele Trivigno, Gabriele Berton, Carlo Masone,, Barbara Caputo

TL;DR
This paper introduces a new sequence-level descriptor, SeqVLAD, and evaluates various fusion techniques, including Transformers, for robust and scalable visual place recognition in robotics, demonstrating state-of-the-art performance.
Contribution
It proposes a novel sequence-level aggregator called SeqVLAD and compares Transformer-based architectures with CNNs for improved place recognition.
Findings
SeqVLAD outperforms previous methods on multiple datasets.
Transformers are viable alternatives to CNNs for this task.
The benchmark highlights strengths and weaknesses of different architectural choices.
Abstract
In robotics, Visual Place Recognition is a continuous process that receives as input a video stream to produce a hypothesis of the robot's current position within a map of known places. This task requires robust, scalable, and efficient techniques for real applications. This work proposes a detailed taxonomy of techniques using sequential descriptors, highlighting different mechanism to fuse the information from the individual images. This categorization is supported by a complete benchmark of experimental results that provides evidence on the strengths and weaknesses of these different architectural choices. In comparison to existing sequential descriptors methods, we further investigate the viability of Transformers instead of CNN backbones, and we propose a new ad-hoc sequence-level aggregator called SeqVLAD, which outperforms prior state of the art on different datasets. The code is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
