GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction
Narges Ghasemi, Amir Ziashahabi, Salman Avestimehr, Cyrus Shahabi

TL;DR
GeoToken introduces a hierarchical, autoregressive approach for image geolocalization that predicts geographic regions in a sequence, leveraging techniques from language modeling to improve accuracy and uncertainty management.
Contribution
The paper presents a novel hierarchical sequence prediction model for geolocalization, inspired by language models, and explores inference strategies like beam search for improved performance.
Findings
Outperforms baselines without MLLMs on multiple metrics.
Achieves state-of-the-art accuracy with up to 13.9% improvement.
Further improves results when combined with Multimodal Large Language Models.
Abstract
Image geolocalization, the task of determining an image's geographic origin, poses significant challenges, largely due to visual similarities across disparate locations and the large search space. To address these issues, we propose a hierarchical sequence prediction approach inspired by how humans narrow down locations from broad regions to specific addresses. Analogously, our model predicts geographic tokens hierarchically, first identifying a general region and then sequentially refining predictions to increasingly precise locations. Rather than relying on explicit semantic partitions, our method uses S2 cells, a nested, multiresolution global grid, and sequentially predicts finer-level cells conditioned on visual inputs and previous predictions. This procedure mirrors autoregressive text generation in large language models. Much like in language modeling, final performance depends…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Geographic Information Systems Studies
