Correctness Comparison of ChatGPT-4, Gemini, Claude-3, and Copilot for   Spatial Tasks

Hartwig H. Hochmair; Levente Juhasz; Takoda Kemp

arXiv:2401.02404·cs.CY·August 14, 2024·1 cites

Correctness Comparison of ChatGPT-4, Gemini, Claude-3, and Copilot for Spatial Tasks

Hartwig H. Hochmair, Levente Juhasz, Takoda Kemp

PDF

Open Access

TL;DR

This study evaluates and compares the correctness of ChatGPT-4, Gemini, Claude-3, and Copilot across 76 diverse spatial tasks, revealing strengths, weaknesses, and consistency levels of these prominent LLMs in geo-spatial applications.

Contribution

It provides a comprehensive zero-shot evaluation of four major chatbots on spatial tasks, highlighting their relative performance and response consistency.

Findings

01

ChatGPT-4, Gemini, Claude-3, and Copilot perform well on spatial literacy and GIS theory tasks.

02

Weaknesses are observed in mapping, code writing, and spatial reasoning tasks.

03

High response consistency over 80% for repeated tasks in most categories.

Abstract

Generative AI including large language models (LLMs) has recently gained significant interest in the geo-science community through its versatile task-solving capabilities including programming, arithmetic reasoning, generation of sample data, time-series forecasting, toponym recognition, or image classification. Most existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero-shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, i.e., ChatGPT-4, Gemini, Claude-3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Human Mobility and Location-Based Analysis · Topic Modeling