Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges
Pengrui Quan, Brian Wang, Kang Yang, Liying Han, Mani Srivastava

TL;DR
This paper introduces STARK, a benchmark for evaluating the spatiotemporal reasoning capabilities of LLMs and LRMs across diverse tasks, revealing strengths and limitations in geometric and world-knowledge reasoning.
Contribution
The paper presents a comprehensive hierarchical benchmark, STARK, for systematically assessing and comparing LLMs and LRMs in complex spatiotemporal reasoning tasks.
Findings
LLMs show limited success in geometric reasoning tasks as complexity increases
LRMs demonstrate robust performance, often surpassing traditional methods
Performance gap narrows in world-knowledge reasoning, with some LLMs outperforming LRMs
Abstract
Spatiotemporal reasoning plays a key role in Cyber-Physical Systems (CPS). Despite advances in Large Language Models (LLMs) and Large Reasoning Models (LRMs), their capacity to reason about complex spatiotemporal signals remains underexplored. This paper proposes a hierarchical SpatioTemporal reAsoning benchmaRK, STARK, to systematically evaluate LLMs across three levels of reasoning complexity: state estimation (e.g., predicting field variables, localizing and tracking events in space and time), spatiotemporal reasoning over states (e.g., inferring spatial-temporal relationships), and world-knowledge-aware reasoning that integrates contextual and domain knowledge (e.g., intent prediction, landmark-aware navigation). We curate 26 distinct spatiotemporal tasks with diverse sensor modalities, comprising 14,552 challenges where models answer directly or by Python Code Interpreter.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsConstraint Satisfaction and Optimization · Multimodal Machine Learning Applications · Human Mobility and Location-Based Analysis
