Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration

Zhongyi Cai; Yi Du; Chen Wang; Yu Kong

arXiv:2512.02458·cs.CV·March 19, 2026

Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration

Zhongyi Cai, Yi Du, Chen Wang, Yu Kong

PDF

Open Access

TL;DR

This paper introduces a 3D spatial memory framework for embodied agents to improve sequential reasoning and exploration, addressing challenges of reusing spatial knowledge across tasks, and provides a new benchmark for evaluation.

Contribution

The paper proposes 3DSPMR, a novel spatial memory reasoning framework utilizing FoV coverage, and introduces SEER-Bench, a comprehensive benchmark for sequential embodied reasoning tasks.

Findings

01

3DSPMR significantly improves performance on sequential EQA and EMN tasks.

02

Incorporating FoV-based constraints enhances spatial reasoning and exploration.

03

SEER-Bench provides a rigorous platform for evaluating sequential embodied AI.

Abstract

Embodied agents are expected to assist humans by actively exploring unknown environments and reasoning about spatial contexts. When deployed in real life, agents often face sequential tasks where each new task follows the completion of the previous one and may include infeasible objectives, such as searching for non-existent objects. However, most existing research focuses on isolated goals, overlooking the core challenge of sequential tasks: the ability to reuse spatial knowledge accumulated from previous explorations to guide subsequent reasoning and exploration. In this work, we investigate this underexplored yet practically significant embodied AI challenge. Specifically, we propose 3DSPMR, a 3D SPatial Memory Reasoning framework that utilizes Field-of-View (FoV) coverage as an explicit geometric prior. By integrating FoV-based constraints, 3DSPMR significantly enhances an agent's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Spatial Cognition and Navigation