GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization

Zixuan Song; Jing Zhang; Di Wang; Zidie Zhou; Wenbin Liu; Haonan Guo; En Wang; Bo Du

arXiv:2512.02697·cs.CV·April 16, 2026

GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization

Zixuan Song, Jing Zhang, Di Wang, Zidie Zhou, Wenbin Liu, Haonan Guo, En Wang, Bo Du

PDF

1 Repo

TL;DR

GeoBridge is a multi-view, multi-modal foundation model that enhances geo-localization by bridging images and text through a semantic-anchor mechanism, supported by a large-scale, cross-modal dataset.

Contribution

It introduces a novel semantic-anchor mechanism for robust, flexible localization across views and modalities, and provides the first large-scale cross-modal, multi-view geo-localization dataset.

Findings

01

GeoBridge improves geo-location accuracy over traditional methods.

02

Pre-training with GeoLoc enhances cross-domain generalization.

03

The model supports bidirectional view matching and language-to-image retrieval.

Abstract

Cross-view geo-localization infers a location by retrieving geo-tagged reference images that visually correspond to a query image. However, the traditional satellite-centric paradigm limits robustness when high-resolution or up-to-date satellite imagery is unavailable. It further underexploits complementary cues across views (\eg, drone, satellite, and street) and modalities (\eg, language and image). To address these challenges, we propose GeoBridge, a novel model that performs bidirectional matching across views and supports language-to-image retrieval. Going beyond traditional satellite-centric formulations, GeoBridge builds on a novel semantic-anchor mechanism that bridges multi-view features through textual descriptions for robust, flexible localization. In support of this task, we construct GeoLoc, the first large-scale, cross-modal, and multi-view aligned dataset comprising over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MiliLab/GeoBridge
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.