Robust Scene Coordinate Regression via Geometrically-Consistent Global Descriptors
Son Tung Nguyen, Alejandro Fontan, Michael Milford, Tobias Fischer

TL;DR
This paper introduces a novel global descriptor learning method that combines geometric and visual cues to improve robustness and accuracy in visual localization, especially in noisy or ambiguous environments.
Contribution
It proposes an aggregator module that learns geometrically consistent global descriptors without manual labels, enhancing localization performance across diverse environments.
Findings
Significant localization improvements on challenging benchmarks
Robustness to noisy geometric constraints and ambiguous scenes
Maintains computational efficiency in large-scale environments
Abstract
Recent learning-based visual localization methods use global descriptors to disambiguate visually similar places, but existing approaches often derive these descriptors from geometric cues alone (e.g., covisibility graphs), limiting their discriminative power and reducing robustness in the presence of noisy geometric constraints. We propose an aggregator module that learns global descriptors consistent with both geometrical structure and visual similarity, ensuring that images are close in descriptor space only when they are visually similar and spatially connected. This corrects erroneous associations caused by unreliable overlap scores. Using a batch-mining strategy based solely on the overlap scores and a modified contrastive loss, our method trains without manual place labels and generalizes across diverse environments. Experiments on challenging benchmarks show substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
