Charting New Territories: Exploring the Geographic and Geospatial   Capabilities of Multimodal LLMs

Jonathan Roberts; Timo L\"uddecke; Rehan Sheikh; Kai Han; Samuel; Albanie

arXiv:2311.14656·cs.CV·January 17, 2024·1 cites

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

Jonathan Roberts, Timo L\"uddecke, Rehan Sheikh, Kai Han, Samuel, Albanie

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the geographic and geospatial capabilities of multimodal large language models, especially GPT-4V, through a new benchmark of visual tasks, revealing their strengths and weaknesses in these domains.

Contribution

It introduces a geographic benchmark for MLLMs, assesses GPT-4V and open-source models, and provides insights into their performance in geographic and geospatial tasks.

Findings

01

GPT-4V outperforms some open-source models in certain tasks.

02

Models excel in some visual tasks but struggle with complex geographic reasoning.

03

Benchmark will be publicly released for future evaluations.

Abstract

Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored, despite potential wide-ranging benefits to navigation, environmental research, urban development, and disaster response. We conduct a series of experiments exploring various vision capabilities of MLLMs within these domains, particularly focusing on the frontier model GPT-4V, and benchmark its performance against open-source counterparts. Our methodology involves challenging these models with a small-scale geographic benchmark consisting of a suite of visual tasks, testing their abilities across a spectrum of complexity. The analysis uncovers not only where such models excel, including instances where they outperform humans, but also where they falter, providing a balanced view of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jonathan-roberts1/charting-new-territories
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling