GaGA: Towards Interactive Global Geolocation Assistant
Zhiyang Dou, Zipeng Wang, Xumeng Han, Guorong Li, Zhipei Huang,, Zhenjun Han

TL;DR
GaGA introduces an interactive, large vision-language model-based system for global image geolocation, leveraging user interaction and a new dataset to achieve state-of-the-art accuracy and explainability.
Contribution
The paper presents GaGA, a novel interactive geolocation assistant utilizing LVLMs and a new dataset, surpassing traditional methods with improved accuracy and user interaction capabilities.
Findings
Achieves 4.57% higher accuracy at country level
Improves 2.92% accuracy at city level
Sets new benchmark on GWS15k dataset
Abstract
Global geolocation, which seeks to predict the geographical location of images captured anywhere in the world, is one of the most challenging tasks in the field of computer vision. In this paper, we introduce an innovative interactive global geolocation assistant named GaGA, built upon the flourishing large vision-language models (LVLMs). GaGA uncovers geographical clues within images and combines them with the extensive world knowledge embedded in LVLMs to determine the geolocations while also providing justifications and explanations for the prediction results. We further designed a novel interactive geolocation method that surpasses traditional static inference approaches. It allows users to intervene, correct, or provide clues for the predictions, making the model more flexible and practical. The development of GaGA relies on the newly proposed Multi-modal Global Geolocation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Agent-Based Network Management · Mobile and Web Applications · Robotics and Automated Systems
