A Unified Approach for Text- and Image-guided 4D Scene Generation

Yufeng Zheng; Xueting Li; Koki Nagano; Sifei Liu; Karsten Kreis; Otmar; Hilliges; Shalini De Mello

arXiv:2311.16854·cs.CV·May 8, 2024·1 cites

A Unified Approach for Text- and Image-guided 4D Scene Generation

Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Karsten Kreis, Otmar, Hilliges, Shalini De Mello

PDF

Open Access

TL;DR

Dream-in-4D introduces a novel two-stage diffusion-based method for generating high-quality, consistent 4D scenes from text prompts, effectively disentangling static assets and motion for flexible, controllable dynamic scene synthesis.

Contribution

It presents the first unified approach combining static 3D asset learning and motion modeling for text-to-4D scene generation using diffusion guidance.

Findings

01

Significantly improves image and motion quality in 4D generation.

02

Enhances 3D consistency and text fidelity over baseline methods.

03

Enables controllable 4D generation from images or text.

Abstract

Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images. However, the challenging problem of text-to-4D dynamic 3D scene generation with diffusion guidance remains largely unexplored. We propose Dream-in-4D, which features a novel two-stage approach for text-to-4D synthesis, leveraging (1) 3D and 2D diffusion guidance to effectively learn a high-quality static 3D asset in the first stage; (2) a deformable neural radiance field that explicitly disentangles the learned static asset from its deformation, preserving quality during motion learning; and (3) a multi-resolution feature grid for the deformation field with a displacement total variation loss to effectively learn motion with video diffusion guidance in the second stage. Through a user preference study, we demonstrate that our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Image Processing and 3D Reconstruction

MethodsDiffusion