TL;DR
This paper introduces GSV-Cities, a large-scale, accurately labeled dataset for visual place recognition, and demonstrates how training with this dataset improves performance of deep learning models, setting new benchmarks.
Contribution
The paper presents GSV-Cities, the largest geographically diverse dataset with precise ground truth, and proposes a new convolutional aggregation layer that outperforms existing methods.
Findings
Training on GSV-Cities improves existing methods significantly.
The new aggregation layer outperforms GeM, NetVLAD, and CosPlace.
Achieved state-of-the-art results on multiple large-scale benchmarks.
Abstract
This paper aims to investigate representation learning for large scale visual place recognition, which consists of determining the location depicted in a query image by referring to a database of reference images. This is a challenging task due to the large-scale environmental changes that can occur over time (i.e., weather, illumination, season, traffic, occlusion). Progress is currently challenged by the lack of large databases with accurate ground truth. To address this challenge, we introduce GSV-Cities, a new image dataset providing the widest geographic coverage to date with highly accurate ground truth, covering more than 40 cities across all continents over a 14-year period. We subsequently explore the full potential of recent advances in deep metric learning to train networks specifically for place recognition, and evaluate how different loss functions influence performance. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
