Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models

Panwang Pan; Chenguo Lin; Jingjing Zhao; Chenxin Li; Yuchen Lin; Haopeng Li; Honglei Yan; Kairun Wen; Yunlong Lin; Yixuan Yuan; Yadong Mu

arXiv:2511.00503·cs.CV·April 8, 2026

Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models

Panwang Pan, Chenguo Lin, Jingjing Zhao, Chenxin Li, Yuchen Lin, Haopeng Li, Honglei Yan, Kairun Wen, Yunlong Lin, Yixuan Yuan, Yadong Mu

PDF

1 Repo

TL;DR

Diff4Splat is a fast, controllable 4D scene synthesis method from a single image, combining diffusion models with geometry and motion constraints for high-quality dynamic scene generation.

Contribution

It introduces a novel feed-forward approach that unifies diffusion priors with 4D geometry and motion learning, enabling efficient scene synthesis without optimization.

Findings

01

Synthesizes high-quality 4D scenes in 30 seconds

02

Matches or surpasses optimization-based methods in dynamic scene synthesis

03

Effective in video generation, view synthesis, and geometry extraction

Abstract

We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image. Our approach unifies the generative priors of video diffusion models with geometry and motion constraints learned from large-scale 4D datasets. Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion, all in a single forward pass, without test-time optimization or post-hoc refinement. At the core of our framework lies a video latent transformer, which augments video diffusion models to jointly capture spatio-temporal dependencies and predict time-varying 3D Gaussian primitives. Training is guided by objectives on appearance fidelity, geometric accuracy, and motion consistency, enabling Diff4Splat to synthesize high-quality 4D scenes in 30…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

paulpanwang/Diff4Splat
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.