4Dynamic: Text-to-4D Generation with Hybrid Priors

Yu-Jie Yuan; Leif Kobbelt; Jiwen Liu; Yuan Zhang; Pengfei Wan; Yu-Kun; Lai; Lin Gao

arXiv:2407.12684·cs.CV·July 18, 2024

4Dynamic: Text-to-4D Generation with Hybrid Priors

Yu-Jie Yuan, Leif Kobbelt, Jiwen Liu, Yuan Zhang, Pengfei Wan, Yu-Kun, Lai, Lin Gao

PDF

Open Access

TL;DR

This paper introduces a novel text-to-4D generation method leveraging hybrid priors, a two-stage process, and dynamic modeling to produce realistic, temporally consistent 4D outputs from text or monocular videos.

Contribution

It proposes a new approach combining static and dynamic priors, a prior-switching training strategy, and dynamic modeling networks for improved 4D generation from text and videos.

Findings

01

Outperforms existing methods in realism and consistency.

02

Supports 4D generation from monocular videos.

03

Ensures dynamic continuity and topological changes.

Abstract

Due to the fascinating generative performance of text-to-image diffusion models, growing text-to-3D generation works explore distilling the 2D generative priors into 3D, using the score distillation sampling (SDS) loss, to bypass the data scarcity problem. The existing text-to-3D methods have achieved promising results in realism and 3D consistency, but text-to-4D generation still faces challenges, including lack of realism and insufficient dynamic motions. In this paper, we propose a novel method for text-to-4D generation, which ensures the dynamic amplitude and authenticity through direct supervision provided by a video prior. Specifically, we adopt a text-to-video diffusion model to generate a reference video and divide 4D generation into two stages: static generation and dynamic generation. The static 3D generation is achieved under the guidance of the input text and the first frame…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology · Model-Driven Software Engineering Techniques · Natural Language Processing Techniques

MethodsDiffusion