Learning Sequential Descriptors for Sequence-based Visual Place   Recognition

Riccardo Mereu; Gabriele Trivigno; Gabriele Berton; Carlo Masone,; Barbara Caputo

arXiv:2207.03868·cs.CV·July 11, 2022

Learning Sequential Descriptors for Sequence-based Visual Place Recognition

Riccardo Mereu, Gabriele Trivigno, Gabriele Berton, Carlo Masone,, Barbara Caputo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new sequence-level descriptor, SeqVLAD, and evaluates various fusion techniques, including Transformers, for robust and scalable visual place recognition in robotics, demonstrating state-of-the-art performance.

Contribution

It proposes a novel sequence-level aggregator called SeqVLAD and compares Transformer-based architectures with CNNs for improved place recognition.

Findings

01

SeqVLAD outperforms previous methods on multiple datasets.

02

Transformers are viable alternatives to CNNs for this task.

03

The benchmark highlights strengths and weaknesses of different architectural choices.

Abstract

In robotics, Visual Place Recognition is a continuous process that receives as input a video stream to produce a hypothesis of the robot's current position within a map of known places. This task requires robust, scalable, and efficient techniques for real applications. This work proposes a detailed taxonomy of techniques using sequential descriptors, highlighting different mechanism to fuse the information from the individual images. This categorization is supported by a complete benchmark of experimental results that provides evidence on the strengths and weaknesses of these different architectural choices. In comparison to existing sequential descriptors methods, we further investigate the viability of Transformers instead of CNN backbones, and we propose a new ad-hoc sequence-level aggregator called SeqVLAD, which outperforms prior state of the art on different datasets. The code is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vandal-vpr/vg-transformers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications