Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
Jonathan Roberts, Timo L\"uddecke, Rehan Sheikh, Kai Han, Samuel, Albanie

TL;DR
This paper evaluates the geographic and geospatial capabilities of multimodal large language models, especially GPT-4V, through a new benchmark of visual tasks, revealing their strengths and weaknesses in these domains.
Contribution
It introduces a geographic benchmark for MLLMs, assesses GPT-4V and open-source models, and provides insights into their performance in geographic and geospatial tasks.
Findings
GPT-4V outperforms some open-source models in certain tasks.
Models excel in some visual tasks but struggle with complex geographic reasoning.
Benchmark will be publicly released for future evaluations.
Abstract
Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored, despite potential wide-ranging benefits to navigation, environmental research, urban development, and disaster response. We conduct a series of experiments exploring various vision capabilities of MLLMs within these domains, particularly focusing on the frontier model GPT-4V, and benchmark its performance against open-source counterparts. Our methodology involves challenging these models with a small-scale geographic benchmark consisting of a suite of visual tasks, testing their abilities across a spectrum of complexity. The analysis uncovers not only where such models excel, including instances where they outperform humans, but also where they falter, providing a balanced view of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
