ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Yining Hong; Jiageng Liu; Han Yin; Manling Li; Leonidas Guibas; Li Fei-Fei; Jiajun Wu; and Yejin Choi

arXiv:2605.18746·cs.CV·May 19, 2026

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Yining Hong, Jiageng Liu, Han Yin, Manling Li, Leonidas Guibas, Li Fei-Fei, Jiajun Wu, and Yejin Choi

PDF

1 Repo

TL;DR

This paper introduces ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence emphasizing active perception and action, revealing that active exploration enhances spatial reasoning in agents.

Contribution

It recasts spatial intelligence as an active perception-action loop, introduces a new benchmark, and analyzes how active exploration and action choices impact spatial reasoning performance.

Findings

01

Active exploration outperforms passive sensing in spatial tasks.

02

Emergent spatial strategies arise without explicit instructions.

03

Poor action choices lead to cascading errors in spatial reasoning.

Abstract

Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and functionality that cannot be resolved from passive sensing alone. We move beyond prior formulations of spatial intelligence that assume oracle observations by recasting the observer as an actor. We introduce ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence spanning 10 task categories and 29 subcategories built on OmniGibson, grounded in Spelke's core knowledge systems. Agents must decide what abilities to deploy - perception, locomotion, and manipulation - and how to sequence them to actively accumulate task-relevant evidence. We conduct extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

esi-bench/ESI-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.