Correctness Comparison of ChatGPT-4, Gemini, Claude-3, and Copilot for Spatial Tasks
Hartwig H. Hochmair, Levente Juhasz, Takoda Kemp

TL;DR
This study evaluates and compares the correctness of ChatGPT-4, Gemini, Claude-3, and Copilot across 76 diverse spatial tasks, revealing strengths, weaknesses, and consistency levels of these prominent LLMs in geo-spatial applications.
Contribution
It provides a comprehensive zero-shot evaluation of four major chatbots on spatial tasks, highlighting their relative performance and response consistency.
Findings
ChatGPT-4, Gemini, Claude-3, and Copilot perform well on spatial literacy and GIS theory tasks.
Weaknesses are observed in mapping, code writing, and spatial reasoning tasks.
High response consistency over 80% for repeated tasks in most categories.
Abstract
Generative AI including large language models (LLMs) has recently gained significant interest in the geo-science community through its versatile task-solving capabilities including programming, arithmetic reasoning, generation of sample data, time-series forecasting, toponym recognition, or image classification. Most existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero-shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, i.e., ChatGPT-4, Gemini, Claude-3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Human Mobility and Location-Based Analysis · Topic Modeling
