Divide&Classify: Fine-Grained Classification for City-Wide Visual Place Recognition
Gabriele Trivigno, Gabriele Berton, Juan Aragon, Barbara Caputo, Carlo, Masone

TL;DR
Divide&Classify introduces a classification-based approach for city-wide visual place recognition, achieving fast inference and high accuracy, and can enhance existing retrieval methods by over 20 times in speed.
Contribution
The paper proposes a novel partitioning scheme and ensemble classifiers with prototypes learned via angular margin loss for fine-grained city-wide recognition.
Findings
D&C achieves competitive accuracy with retrieval methods.
Pairing D&C with retrieval pipelines speeds up computation by over 20 times.
D&C improves recall in large-scale visual place recognition.
Abstract
Visual Place recognition is commonly addressed as an image retrieval problem. However, retrieval methods are impractical to scale to large datasets, densely sampled from city-wide maps, since their dimension impact negatively on the inference time. Using approximate nearest neighbour search for retrieval helps to mitigate this issue, at the cost of a performance drop. In this paper we investigate whether we can effectively approach this task as a classification problem, thus bypassing the need for a similarity search. We find that existing classification methods for coarse, planet-wide localization are not suitable for the fine-grained and city-wide setting. This is largely due to how the dataset is split into classes, because these methods are designed to handle a sparse distribution of photos and as such do not consider the visual aliasing problem across neighbouring classes that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Remote-Sensing Image Classification · Video Surveillance and Tracking Methods
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
