Robust Scene Coordinate Regression via Geometrically-Consistent Global Descriptors

Son Tung Nguyen; Alejandro Fontan; Michael Milford; Tobias Fischer

arXiv:2512.17226·cs.CV·January 9, 2026

Robust Scene Coordinate Regression via Geometrically-Consistent Global Descriptors

Son Tung Nguyen, Alejandro Fontan, Michael Milford, Tobias Fischer

PDF

Open Access

TL;DR

This paper introduces a novel global descriptor learning method that combines geometric and visual cues to improve robustness and accuracy in visual localization, especially in noisy or ambiguous environments.

Contribution

It proposes an aggregator module that learns geometrically consistent global descriptors without manual labels, enhancing localization performance across diverse environments.

Findings

01

Significant localization improvements on challenging benchmarks

02

Robustness to noisy geometric constraints and ambiguous scenes

03

Maintains computational efficiency in large-scale environments

Abstract

Recent learning-based visual localization methods use global descriptors to disambiguate visually similar places, but existing approaches often derive these descriptors from geometric cues alone (e.g., covisibility graphs), limiting their discriminative power and reducing robustness in the presence of noisy geometric constraints. We propose an aggregator module that learns global descriptors consistent with both geometrical structure and visual similarity, ensuring that images are close in descriptor space only when they are visually similar and spatially connected. This corrects erroneous associations caused by unreliable overlap scores. Using a batch-mining strategy based solely on the overlap scores and a modified contrastive loss, our method trains without manual place labels and generalizes across diverse environments. Experiments on challenging benchmarks show substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications