TL;DR
This paper introduces MultiRes-NetVLAD, a novel approach that enhances place recognition by integrating low-resolution image pyramids into CNN training, resulting in richer global descriptors and state-of-the-art retrieval performance.
Contribution
It proposes a multi-resolution feature pyramid encoding method for NetVLAD that improves place recognition accuracy and can be combined with existing multi-scale techniques.
Findings
Achieves state-of-the-art Recall@N on 15 benchmarking datasets.
Outperforms 11 existing place recognition methods.
Enables richer global descriptors through low-resolution image pyramids.
Abstract
Visual Place Recognition (VPR) is a crucial component of 6-DoF localization, visual SLAM and structure-from-motion pipelines, tasked to generate an initial list of place match hypotheses by matching global place descriptors. However, commonly-used CNN-based methods either process multiple image resolutions after training or use a single resolution and limit multi-scale feature extraction to the last convolutional layer during training. In this paper, we augment NetVLAD representation learning with low-resolution image pyramid encoding which leads to richer place representations. The resultant multi-resolution feature pyramid can be conveniently aggregated through VLAD into a single compact representation, avoiding the need for concatenation or summation of multiple patches in recent multi-scale approaches. Furthermore, we show that the underlying learnt feature tensor can be combined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
