LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild
Zhiqiang Wang, Dejia Xu, Rana Muhammad Shahroz Khan, Yanbin Lin,, Zhiwen Fan, Xingquan Zhu

TL;DR
This paper evaluates the geolocation capabilities of large multimodal language models on a new in-the-wild image dataset, revealing that closed-source models excel while open-source models can perform well after fine-tuning.
Contribution
It introduces a novel dataset and comprehensive evaluation framework for assessing large language models' geolocation abilities on real-world images.
Findings
Closed-source models outperform open-source models in zero-shot geolocation.
Open-source models can match closed-source performance after fine-tuning.
The study provides insights into the strengths and limitations of multimodal models for geolocation.
Abstract
Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images from various countries via Google Street View. Then, we conduct training-free and training-based evaluations on closed-source and open-source multi-modal language models. we conduct both training-free and training-based evaluations on closed-source and open-source multimodal language models. Our findings indicate that closed-source models demonstrate superior geolocation abilities, while open-source models can achieve comparable performance through fine-tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
