GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
Yikun Wang, Zuyan Liu, Ziyi Wang, Han Hu, Pengfei Liu, Yongming Rao

TL;DR
GeoVista is a novel agentic model that combines visual reasoning and web search to improve geolocalization accuracy, supported by a new benchmark GeoBench with diverse high-resolution imagery.
Contribution
The paper introduces GeoVista, a new agentic model with integrated tool use for geolocalization, and curates GeoBench, a comprehensive benchmark for evaluating such models.
Findings
GeoVista outperforms other open-source models on GeoBench.
GeoVista achieves comparable results to closed-source models like Gemini-2.5-flash and GPT-5.
The hierarchical reward improves geolocalization performance.
Abstract
Current research on agentic visual reasoning enables deep multimodal understanding but primarily focuses on image manipulation tools, leaving a gap toward more general-purpose agentic models. In this work, we revisit the geolocalization task, which requires not only nuanced visual grounding but also web search to confirm or refine hypotheses during reasoning. Since existing geolocalization benchmarks fail to meet the need for high-resolution imagery and the localization challenge for deep agentic reasoning, we curate GeoBench, a benchmark that includes photos and panoramas from around the world, along with a subset of satellite images of different cities to rigorously evaluate the geolocalization ability of agentic models. We also propose GeoVista, an agentic model that seamlessly integrates tool invocation within the reasoning loop, including an image-zoom-in tool to magnify regions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
