LLMGeo: Benchmarking Large Language Models on Image Geolocation   In-the-wild

Zhiqiang Wang; Dejia Xu; Rana Muhammad Shahroz Khan; Yanbin Lin,; Zhiwen Fan; Xingquan Zhu

arXiv:2405.20363·cs.CV·June 3, 2024

LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild

Zhiqiang Wang, Dejia Xu, Rana Muhammad Shahroz Khan, Yanbin Lin,, Zhiwen Fan, Xingquan Zhu

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the geolocation capabilities of large multimodal language models on a new in-the-wild image dataset, revealing that closed-source models excel while open-source models can perform well after fine-tuning.

Contribution

It introduces a novel dataset and comprehensive evaluation framework for assessing large language models' geolocation abilities on real-world images.

Findings

01

Closed-source models outperform open-source models in zero-shot geolocation.

02

Open-source models can match closed-source performance after fine-tuning.

03

The study provides insights into the strengths and limitations of multimodal models for geolocation.

Abstract

Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images from various countries via Google Street View. Then, we conduct training-free and training-based evaluations on closed-source and open-source multi-modal language models. we conduct both training-free and training-based evaluations on closed-source and open-source multimodal language models. Our findings indicate that closed-source models demonstrate superior geolocation abilities, while open-source models can achieve comparable performance through fine-tuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yeyimilk/llmgeo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques