4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency
Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, Yunchao Wei

TL;DR
4DGen introduces a controllable 4D content creation framework using monocular videos and dynamic 3D Gaussians, enabling high-quality, realistic, and grounded 4D content with improved motion control and consistency.
Contribution
The paper presents a novel 4D content generation pipeline that leverages monocular videos and dynamic 3D Gaussians for controllable, high-resolution, and consistent 4D synthesis, surpassing prior methods in quality and control.
Findings
Outperforms existing video-to-4D methods in reconstruction fidelity.
Enables user-controlled motion via monocular videos or image-to-video generation.
Produces realistic, high-quality 4D content with spatial-temporal consistency.
Abstract
Aided by text-to-image and text-to-video diffusion models, existing 4D content creation pipelines utilize score distillation sampling to optimize the entire dynamic 3D scene. However, as these pipelines generate 4D content from text or image inputs directly, they are constrained by limited motion capabilities and depend on unreliable prompt engineering for desired results. To address these problems, this work introduces \textbf{4DGen}, a novel framework for grounded 4D content creation. We identify monocular video sequences as a key component in constructing the 4D content. Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations, thus offering superior control over content creation. Furthermore, we construct our 4D representation using dynamic 3D Gaussians, which permits efficient, high-resolution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Video Analysis and Summarization
MethodsDiffusion
