DSG-World: Learning a 3D Gaussian World Model from Dual State Videos

Wenhao Hu; Xuexiang Wen; Xi Li; Gaoang Wang

arXiv:2506.05217·cs.CV·June 6, 2025

DSG-World: Learning a 3D Gaussian World Model from Dual State Videos

Wenhao Hu, Xuexiang Wen, Xi Li, Gaoang Wang

PDF

Open Access

TL;DR

DSG-World introduces an end-to-end framework that constructs a 3D Gaussian world model from dual state videos, improving occlusion handling, physical consistency, and enabling high-fidelity scene manipulation.

Contribution

It presents a novel explicit 3D Gaussian model built from dual observations, with bidirectional consistency and collaborative refinement, advancing 3D scene reconstruction from limited data.

Findings

01

Strong generalization to novel views and states

02

Effective occlusion handling and scene completeness

03

Supports high-fidelity rendering and manipulation

Abstract

Building an efficient and physically consistent world model from limited observations is a long standing challenge in vision and robotics. Many existing world modeling pipelines are based on implicit generative models, which are hard to train and often lack 3D or physical consistency. On the other hand, explicit 3D methods built from a single state often require multi-stage processing-such as segmentation, background completion, and inpainting-due to occlusions. To address this, we leverage two perturbed observations of the same scene under different object configurations. These dual states offer complementary visibility, alleviating occlusion issues during state transitions and enabling more stable and complete reconstruction. In this paper, we present DSG-World, a novel end-to-end framework that explicitly constructs a 3D Gaussian World model from Dual State observations. Our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging