GaussianDream: A Feed-Forward 3D Gaussian World Model for Robotic Manipulation
Zijian Zhang, Yuqing Jiang, Qian Cheng, Si Liu, Ding Zhao, Ping Luo, Weitao Zhou, Haibao Yu

TL;DR
GaussianDream introduces a 3D Gaussian world model that enhances robotic manipulation by providing dense supervision and enabling precise, physically grounded action generation without test-time decoding.
Contribution
It presents a novel feed-forward 3D Gaussian world model that couples current and future Gaussian predictions for dense supervision in robotic manipulation tasks.
Findings
Achieved 98.4% success on LIBERO benchmark.
Demonstrated strong performance on RoboCasa Human-50.
Secured 50% success rate in real-world robot experiments.
Abstract
Vision-language-action (VLA) policies have advanced language-conditioned robotic manipulation by transferring semantic priors from pretrained vision-language models to action generation. Yet, standard action-imitation training often provides limited explicit supervision for 3D geometry, dense visual structure, and short-horizon environment evolution, which are critical for physically precise manipulation. We introduce \textbf{GaussianDream}, a feed-forward 3D Gaussian world-model plug-in that turns robot trajectories into structured spatial-temporal supervision. The key idea is to couple current Gaussian reconstruction with horizon-conditioned future Gaussian prediction during training, forcing a compact spatio-temporal prefix to be decodable into renderable 3D Gaussian states. This enables dense RGB rendering, depth, and pseudo 3D scene-flow supervision without requiring test-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
