Evaluation of Visual Place Recognition Methods for Image Pair Retrieval in 3D Vision and Robotics
Dennis Haitz, Athradi Shritish Shetty, Michael Weinmann, Markus Ulrich

TL;DR
This paper evaluates various visual place recognition methods as image pair retrieval tools for 3D scene registration and SLAM, highlighting their strengths and weaknesses across diverse challenging datasets.
Contribution
It provides a comprehensive comparison of state-of-the-art VPR methods for image pair retrieval in 3D vision and robotics applications, emphasizing their domain-dependent performance.
Findings
Global descriptor methods are effective for challenging scenarios.
Modern VPR methods show domain-dependent strengths and weaknesses.
Evaluation across diverse datasets highlights practical considerations for VPR selection.
Abstract
Visual Place Recognition (VPR) is a core component in computer vision, typically formulated as an image retrieval task for localization, mapping, and navigation. In this work, we instead study VPR as an image pair retrieval front-end for registration pipelines, where the goal is to find top-matching image pairs between two disjoint image sets for downstream tasks such as scene registration, SLAM, and Structure-from-Motion. We comparatively evaluate state-of-the-art VPR families - NetVLAD-style baselines, classification-based global descriptors (CosPlace, EigenPlaces), feature-mixing (MixVPR), and foundation-model-driven methods (AnyLoc, SALAD, MegaLoc) - on three challenging datasets: object-centric outdoor scenes (Tanks and Temples), indoor RGB-D scans (ScanNet-GS), and autonomous-driving sequences (KITTI). We show that modern global descriptor approaches are increasingly suitable as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
