Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning

Yijun Liu; Yuwei Liu; Yuan Meng; Jieheng Zhang; Yuwei Zhou; Ye Li; Jiacheng Jiang; Kangye Ji; Shijia Ge; Zhi Wang; Wenwu Zhu

arXiv:2508.15874·cs.RO·November 19, 2025

Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning

Yijun Liu, Yuwei Liu, Yuan Meng, Jieheng Zhang, Yuwei Zhou, Ye Li, Jiacheng Jiang, Kangye Ji, Shijia Ge, Zhi Wang, Wenwu Zhu

PDF

TL;DR

This paper introduces Spatial Policy (SP), a novel spatial-aware visuomotor framework for robotic manipulation that improves task performance by explicit spatial modeling and reasoning, bridging visual plans to control in complex environments.

Contribution

The paper presents a unified spatial-aware visuomotor framework with explicit spatial modeling, flow-based action prediction, and a spatial reasoning feedback policy, advancing robotic manipulation capabilities.

Findings

01

Over 33% improvement on Meta-World tasks

02

Over 25% improvement on iTHOR tasks

03

Effective in real-world robotic experiments

Abstract

Vision-centric hierarchical embodied models have demonstrated strong potential. However, existing methods lack spatial awareness capabilities, limiting their effectiveness in bridging visual plans to actionable control in complex environments. To address this problem, we propose Spatial Policy (SP), a unified spatial-aware visuomotor robotic manipulation framework via explicit spatial modeling and reasoning. Specifically, we first design a spatial-conditioned embodied video generation module to model spatially guided predictions through the spatial plan table. Then, we propose a flow-based action prediction module to infer executable actions with coordination. Finally, we propose a spatial reasoning feedback policy to refine the spatial plan table via dual-stage replanning. Extensive experiments show that SP substantially outperforms state-of-the-art baselines, achieving over 33%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.