Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors

Wen-Hsuan Chu; Lei Ke; Jianmeng Liu; Mingxiao Huo; Pavel Tokmakov; Katerina Fragkiadaki

arXiv:2506.12716·cs.CV·June 17, 2025

Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors

Wen-Hsuan Chu, Lei Ke, Jianmeng Liu, Mingxiao Huo, Pavel Tokmakov, Katerina Fragkiadaki

PDF

Open Access

TL;DR

GenMOJO is a novel method that generates dynamic 4D scenes from monocular videos by decomposing scenes into objects, optimizing deformable Gaussians, and leveraging generative priors for realistic view synthesis.

Contribution

It introduces a scene decomposition approach with object-wise Gaussian optimization and integrates generative priors for improved 4D scene reconstruction and view synthesis.

Findings

01

Outperforms existing methods in realistic scene rendering

02

Produces more accurate 2D and 3D point tracks

03

Generates highly realistic novel views

Abstract

We tackle the challenge of generating dynamic 4D scenes from monocular, multi-object videos with heavy occlusions, and introduce GenMOJO, a novel approach that integrates rendering-based deformable 3D Gaussian optimization with generative priors for view synthesis. While existing models perform well on novel view synthesis for isolated objects, they struggle to generalize to complex, cluttered scenes. To address this, GenMOJO decomposes the scene into individual objects, optimizing a differentiable set of deformable Gaussians per object. This object-wise decomposition allows leveraging object-centric diffusion models to infer unobserved regions in novel viewpoints. It performs joint Gaussian splatting to render the full scene, capturing cross-object occlusions, and enabling occlusion-aware supervision. To bridge the gap between object-centric priors and the global frame-centric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Remote Sensing and LiDAR Applications

MethodsALIGN · Diffusion · Sparse Evolutionary Training