Breaking the Frame: Visual Place Recognition by Overlap Prediction

Tong Wei; Philipp Lindenberger; Jiri Matas; Daniel Barath

arXiv:2406.16204·cs.CV·December 5, 2024·1 cites

Breaking the Frame: Visual Place Recognition by Overlap Prediction

Tong Wei, Philipp Lindenberger, Jiri Matas, Daniel Barath

PDF

Open Access 1 Repo

TL;DR

This paper introduces VOP, a novel visual place recognition method that predicts image overlap using patch-level embeddings, improving localization accuracy in challenging scenarios with occlusions and partial overlaps.

Contribution

VOP shifts from traditional global similarity metrics to overlap prediction using Vision Transformers, enabling more accurate place recognition without expensive feature matching.

Findings

01

VOP achieves higher localization accuracy than state-of-the-art methods.

02

VOP effectively handles occlusions and partial overlaps in large-scale benchmarks.

03

The approach improves relative pose estimation in challenging environments.

Abstract

Visual place recognition methods struggle with occlusions and partial visual overlaps. We propose a novel visual place recognition approach based on overlap prediction, called VOP, shifting from traditional reliance on global image similarities and local features to image overlap prediction. VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone and establishing patch-to-patch correspondences without requiring expensive feature detection and matching. Our approach uses a voting mechanism to assess overlap scores for potential database images. It provides a nuanced image retrieval metric in challenging scenarios. Experimental results show that VOP leads to more accurate relative pose estimation and localization results on the retrieved image pairs than state-of-the-art baselines on a number of large-scale, real-world indoor and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weitong8591/vop
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer · Absolute Position Encodings