DSG-World: Learning a 3D Gaussian World Model from Dual State Videos
Wenhao Hu, Xuexiang Wen, Xi Li, Gaoang Wang

TL;DR
DSG-World introduces an end-to-end framework that constructs a 3D Gaussian world model from dual state videos, improving occlusion handling, physical consistency, and enabling high-fidelity scene manipulation.
Contribution
It presents a novel explicit 3D Gaussian model built from dual observations, with bidirectional consistency and collaborative refinement, advancing 3D scene reconstruction from limited data.
Findings
Strong generalization to novel views and states
Effective occlusion handling and scene completeness
Supports high-fidelity rendering and manipulation
Abstract
Building an efficient and physically consistent world model from limited observations is a long standing challenge in vision and robotics. Many existing world modeling pipelines are based on implicit generative models, which are hard to train and often lack 3D or physical consistency. On the other hand, explicit 3D methods built from a single state often require multi-stage processing-such as segmentation, background completion, and inpainting-due to occlusions. To address this, we leverage two perturbed observations of the same scene under different object configurations. These dual states offer complementary visibility, alleviating occlusion issues during state transitions and enabling more stable and complete reconstruction. In this paper, we present DSG-World, a novel end-to-end framework that explicitly constructs a 3D Gaussian World model from Dual State observations. Our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
