TL;DR
This paper provides a comprehensive benchmark and analysis of image retrieval techniques' impact on visual localization, revealing gaps and opportunities for specialized retrieval methods to improve camera pose estimation accuracy.
Contribution
It introduces a new benchmark setup for evaluating retrieval in localization, compares state-of-the-art representations, and analyzes the influence of different ground truth definitions and scene conditions.
Findings
Retrieval performance correlates variably with localization accuracy across paradigms.
Significant room for improvement exists in retrieval methods tailored for localization.
Scene conditions like blur and dynamics affect retrieval and localization performance.
Abstract
Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
