STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis
Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo,, Chang Gong, Quanliang Jing, Haining Tan, Jingping Bi

TL;DR
This paper introduces STBench, a comprehensive benchmark dataset with 13 tasks and over 60,000 QA pairs to evaluate large language models' capabilities in understanding and reasoning about spatio-temporal data.
Contribution
The paper develops a new benchmark dataset, STBench, to systematically assess LLMs' spatio-temporal understanding across four key dimensions, addressing limitations of prior evaluations.
Findings
LLMs perform well in knowledge comprehension and reasoning tasks
Potential for improvement with prompting and fine-tuning techniques
Assessment of 13 different large language models
Abstract
The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address this gap, this paper dissects LLMs' capability of spatio-temporal data into four distinct dimensions: knowledge comprehension, spatio-temporal reasoning, accurate computation, and downstream applications. We curate several natural language question-answer tasks for each category and build the benchmark dataset, namely STBench, containing 13 distinct tasks and over 60,000 QA pairs. Moreover, we have assessed the capabilities of 13 LLMs, such as GPT-4o, Gemma and Mistral. Experimental results reveal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Data Mining Algorithms and Applications · Human Mobility and Location-Based Analysis
MethodsFocus
