GWM: Towards Scalable Gaussian World Models for Robotic Manipulation

Guanxing Lu; Baoxiong Jia; Puhao Li; Yixin Chen; Ziwei Wang; Yansong Tang; Siyuan Huang

arXiv:2508.17600·cs.RO·September 18, 2025

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation

Guanxing Lu, Baoxiong Jia, Puhao Li, Yixin Chen, Ziwei Wang, Yansong Tang, Siyuan Huang

PDF

TL;DR

This paper introduces GWM, a novel 3D Gaussian world model for robotic manipulation that improves future scene prediction and policy training by integrating a latent Diffusion Transformer with a 3D autoencoder, demonstrating superior performance in experiments.

Contribution

The paper presents GWM, a new Gaussian world model that reconstructs future states with Gaussian primitives, combining a latent Diffusion Transformer and 3D autoencoder for enhanced scene understanding.

Findings

01

GWM accurately predicts future scenes conditioned on robot actions.

02

Policies trained with GWM outperform state-of-the-art methods.

03

GWM demonstrates strong data scaling potential for 3D world modeling.

Abstract

Training robot policies within a learned world model is trending due to the inefficiency of real-world interactions. The established image-based world models and policies have shown prior success, but lack robust geometric information that requires consistent spatial and physical understanding of the three-dimensional world, even pre-trained on internet-scale video sources. To this end, we propose a novel branch of world model named Gaussian World Model (GWM) for robotic manipulation, which reconstructs the future state by inferring the propagation of Gaussian primitives under the effect of robot actions. At its core is a latent Diffusion Transformer (DiT) combined with a 3D variational autoencoder, enabling fine-grained scene-level future state reconstruction with Gaussian Splatting. GWM can not only enhance the visual representation for imitation learning agent by self-supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.