TL;DR
DualGeo is a two-stage worldwide image geo-localization framework that combines feature fusion, contrastive learning, and large multimodal models to improve localization accuracy across various scales.
Contribution
It introduces a novel dual-view framework that fuses image and semantic features, and employs geo-cognitive refinement with multimodal models for enhanced accuracy.
Findings
Outperforms state-of-the-art methods on IM2GPS, IM2GPS3k, and YFCC4k datasets.
Improves street-level localization accuracy by up to 16.58%.
Enhances city-level localization accuracy by up to 8.77%.
Abstract
Worldwide image geo-localization aims to infer the geographic location of an image captured anywhere on Earth, spanning street, city, regional, national, and continental scales. Existing methods rely on visual features that are sensitive to environmental variations (e.g., lighting, season, and weather) and lack effective post-processing to filter outlier candidates, limiting localization accuracy. To address these limitations, we propose DualGeo, a two-stage framework for worldwide image geo-localization. First, it establishes a geo-representational foundation by fusing image and semantic segmentation features via bidirectional cross-attention. The fused features are then aligned with GPS coordinates through dual-view contrastive learning to build a global retrieval database. Second, it performs geo-cognitive refinement by re-ranking retrieved candidates using geographic clustering. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
