TL;DR
GeoBridge is a multi-view, multi-modal foundation model that enhances geo-localization by bridging images and text through a semantic-anchor mechanism, supported by a large-scale, cross-modal dataset.
Contribution
It introduces a novel semantic-anchor mechanism for robust, flexible localization across views and modalities, and provides the first large-scale cross-modal, multi-view geo-localization dataset.
Findings
GeoBridge improves geo-location accuracy over traditional methods.
Pre-training with GeoLoc enhances cross-domain generalization.
The model supports bidirectional view matching and language-to-image retrieval.
Abstract
Cross-view geo-localization infers a location by retrieving geo-tagged reference images that visually correspond to a query image. However, the traditional satellite-centric paradigm limits robustness when high-resolution or up-to-date satellite imagery is unavailable. It further underexploits complementary cues across views (\eg, drone, satellite, and street) and modalities (\eg, language and image). To address these challenges, we propose GeoBridge, a novel model that performs bidirectional matching across views and supports language-to-image retrieval. Going beyond traditional satellite-centric formulations, GeoBridge builds on a novel semantic-anchor mechanism that bridges multi-view features through textual descriptions for robust, flexible localization. In support of this task, we construct GeoLoc, the first large-scale, cross-modal, and multi-view aligned dataset comprising over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
