Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

Zhenyang Liu; Yikai Wang; Kuanning Wang; Longfei Liang; Xiangyang Xue; Yanwei Fu

arXiv:2507.06710·cs.RO·July 15, 2025

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

Zhenyang Liu, Yikai Wang, Kuanning Wang, Longfei Liang, Xiangyang Xue, Yanwei Fu

PDF

Open Access

TL;DR

This paper introduces DP4, a novel diffusion-based visual imitation learning method that incorporates 3D spatial and 4D spatiotemporal awareness, enabling robots to better understand and predict complex environments for improved task success.

Contribution

The paper presents DP4, a diffusion policy that models 3D spatial and 4D spatiotemporal perceptions from a single RGB-D view, advancing beyond trajectory cloning methods.

Findings

01

Outperforms baseline methods in simulation tasks with up to 16.4% success rate improvement.

02

Achieves 8.6% higher success rate in real-world robotic tasks.

03

Effectively models future 3D scenes from single-view observations.

Abstract

Visual imitation learning is effective for robots to learn versatile tasks. However, many existing methods rely on behavior cloning with supervised historical trajectories, limiting their 3D spatial and 4D spatiotemporal awareness. Consequently, these methods struggle to capture the 3D structures and 4D spatiotemporal relationships necessary for real-world deployment. In this work, we propose 4D Diffusion Policy (DP4), a novel visual imitation learning method that incorporates spatiotemporal awareness into diffusion-based policies. Unlike traditional approaches that rely on trajectory cloning, DP4 leverages a dynamic Gaussian world model to guide the learning of 3D spatial and 4D spatiotemporal perceptions from interactive environments. Our method constructs the current 3D scene from a single-view RGB-D observation and predicts the future 3D scene, optimizing trajectory generation by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStroke Rehabilitation and Recovery · Motor Control and Adaptation · Reinforcement Learning in Robotics