GaussianDream: A Feed-Forward 3D Gaussian World Model for Robotic Manipulation

Zijian Zhang; Yuqing Jiang; Qian Cheng; Si Liu; Ding Zhao; Ping Luo; Weitao Zhou; Haibao Yu

arXiv:2605.20752·cs.RO·May 21, 2026

GaussianDream: A Feed-Forward 3D Gaussian World Model for Robotic Manipulation

Zijian Zhang, Yuqing Jiang, Qian Cheng, Si Liu, Ding Zhao, Ping Luo, Weitao Zhou, Haibao Yu

PDF

TL;DR

GaussianDream introduces a 3D Gaussian world model that enhances robotic manipulation by providing dense supervision and enabling precise, physically grounded action generation without test-time decoding.

Contribution

It presents a novel feed-forward 3D Gaussian world model that couples current and future Gaussian predictions for dense supervision in robotic manipulation tasks.

Findings

01

Achieved 98.4% success on LIBERO benchmark.

02

Demonstrated strong performance on RoboCasa Human-50.

03

Secured 50% success rate in real-world robot experiments.

Abstract

Vision-language-action (VLA) policies have advanced language-conditioned robotic manipulation by transferring semantic priors from pretrained vision-language models to action generation. Yet, standard action-imitation training often provides limited explicit supervision for 3D geometry, dense visual structure, and short-horizon environment evolution, which are critical for physically precise manipulation. We introduce \textbf{GaussianDream}, a feed-forward 3D Gaussian world-model plug-in that turns robot trajectories into structured spatial-temporal supervision. The key idea is to couple current Gaussian reconstruction with horizon-conditioned future Gaussian prediction during training, forcing a compact spatio-temporal prefix to be decodable into renderable 3D Gaussian states. This enables dense RGB rendering, depth, and pseudo 3D scene-flow supervision without requiring test-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.