TL;DR
This paper introduces ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence emphasizing active perception and action, revealing that active exploration enhances spatial reasoning in agents.
Contribution
It recasts spatial intelligence as an active perception-action loop, introduces a new benchmark, and analyzes how active exploration and action choices impact spatial reasoning performance.
Findings
Active exploration outperforms passive sensing in spatial tasks.
Emergent spatial strategies arise without explicit instructions.
Poor action choices lead to cascading errors in spatial reasoning.
Abstract
Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and functionality that cannot be resolved from passive sensing alone. We move beyond prior formulations of spatial intelligence that assume oracle observations by recasting the observer as an actor. We introduce ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence spanning 10 task categories and 29 subcategories built on OmniGibson, grounded in Spelke's core knowledge systems. Agents must decide what abilities to deploy - perception, locomotion, and manipulation - and how to sequence them to actively accumulate task-relevant evidence. We conduct extensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
