REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories
Jacob Thompson, Emiliano Garcia-Lopez, Yonatan Bisk

TL;DR
This paper introduces REM, a benchmark for evaluating the spatial reasoning abilities of multimodal large language models in 3D environments, revealing current limitations and guiding future improvements.
Contribution
The paper presents REM, a novel benchmark for assessing embodied spatial reasoning in MLLMs, with detailed metrics and diagnostics for systematic evaluation.
Findings
Current models perform poorly on complex spatial tasks.
Models show promising but unreliable overall performance.
REM highlights key challenges in developing robust spatial representations.
Abstract
Humans build viewpoint-independent cognitive maps through navigation, enabling intuitive reasoning about object permanence and spatial relations. We argue that multimodal large language models (MLLMs), despite extensive video training, lack this fundamental spatial reasoning capability, a critical limitation for embodied applications. To demonstrate these limitations and drive research, we introduce REM (Reasoning over Embodied Multi-Frame Trajectories), a benchmark using controllable 3D environments for long-horizon embodied spatial reasoning. REM systematically evaluates key aspects like object permanence/distinction, spatial relationships, and numerical tracking across dynamic embodied viewpoints. Our evaluation shows that the best-performing current models exhibit promising overall performance, but become increasingly unreliable at even moderate complexity levels easily handled by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial Cognition and Navigation · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization
