TL;DR
GeoSearch enhances worldwide image geolocalization by integrating web-scale reverse image search with retrieval-augmented generation, improving accuracy on standard benchmarks.
Contribution
It introduces a novel open-world geolocation framework combining web-scale reverse image search with RAG, including a two-layer filtering mechanism to reduce noise.
Findings
Outperforms existing methods on Im2GPS3k and YFCC4k benchmarks.
Effective noise mitigation through image matching and confidence gating.
Code and data are publicly available for reproducibility.
Abstract
Worldwide image geolocalization, which aims to predict the GPS coordinates of any image on Earth, remains challenging due to global visual diversity. Recent generative approaches based on Retrieval-Augmented Generation (RAG) and Large Multimodal Models (LMMs) leverage candidates retrieved from fixed databases for reasoning, but often struggle with scenes that are absent from the reference set. In this work, we propose GeoSearch, an open-world geolocation framework that integrates web-scale reverse image search into the RAG pipeline. GeoSearch augments LMM prompts with database-retrieved coordinates and textual evidence extracted from web pages. To mitigate noise from irrelevant content, we introduce a two-layer filtering mechanism consisting of image matching, followed by confidence-based gating. Experiments on standard benchmarks Im2GPS3k and YFCC4k demonstrate the superiority of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
