Breaking the Frame: Visual Place Recognition by Overlap Prediction
Tong Wei, Philipp Lindenberger, Jiri Matas, Daniel Barath

TL;DR
This paper introduces VOP, a novel visual place recognition method that predicts image overlap using patch-level embeddings, improving localization accuracy in challenging scenarios with occlusions and partial overlaps.
Contribution
VOP shifts from traditional global similarity metrics to overlap prediction using Vision Transformers, enabling more accurate place recognition without expensive feature matching.
Findings
VOP achieves higher localization accuracy than state-of-the-art methods.
VOP effectively handles occlusions and partial overlaps in large-scale benchmarks.
The approach improves relative pose estimation in challenging environments.
Abstract
Visual place recognition methods struggle with occlusions and partial visual overlaps. We propose a novel visual place recognition approach based on overlap prediction, called VOP, shifting from traditional reliance on global image similarities and local features to image overlap prediction. VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone and establishing patch-to-patch correspondences without requiring expensive feature detection and matching. Our approach uses a voting mechanism to assess overlap scores for potential database images. It provides a nuanced image retrieval metric in challenging scenarios. Experimental results show that VOP leads to more accurate relative pose estimation and localization results on the retrieved image pairs than state-of-the-art baselines on a number of large-scale, real-world indoor and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer · Absolute Position Encodings
