Learning to Wander: Improving the Global Image Geolocation Ability of LMMs via Actionable Reasoning
Yushuo Zheng, Huiyu Duan, Zicheng Zhang, Xiaohong Liu, and Xiongkuo Min

TL;DR
This paper introduces WanderBench, a new geolocation benchmark with interactive panoramas, and GeoAoT, a reasoning framework that actively explores environments to improve image geolocation accuracy.
Contribution
It presents WanderBench for embodied geolocation reasoning and GeoAoT, a novel framework coupling reasoning with physical actions for improved localization.
Findings
GeoAoT outperforms existing models in fine-grained localization.
WanderBench enables evaluation of geolocation in interactive, embodied scenarios.
Models show better generalization in dynamic environments.
Abstract
Geolocation, the task of identifying the geographic location of an image, requires abundant world knowledge and complex reasoning abilities. Though advanced large multimodal models (LMMs) have shown superior aforementioned capabilities, their performance on the geolocation task remains unexplored. To this end, we introduce \textbf{WanderBench}, the first open access global geolocation benchmark designed for actionable geolocation reasoning in embodied scenarios. WanderBench contains over 32K panoramas across six continents, organized as navigable graphs that enable physical actions such as rotation and movement, transforming geolocation from static recognition into interactive exploration. Building on this foundation, we propose \textbf{GeoAoT} (Action of Thought), a \underline{Geo}location framework with \underline{A}ction of \underline{T}hough, which couples reasoning with embodied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
