CT4D: Consistent Text-to-4D Generation with Animatable Meshes
Ce Chen, Shaoli Huang, Xuelin Chen, Guangyi Chen, Xiaoguang Han, Kun, Zhang, Mingming Gong

TL;DR
CT4D introduces a novel mesh-based framework for consistent text-to-4D generation, ensuring stable motion and geometry over time, and enabling advanced editing capabilities.
Contribution
The paper presents a new mesh-based approach with a Generate-Refine-Animate algorithm and surface continuity techniques for improved text-to-4D generation.
Findings
Outperforms existing methods in interframe consistency
Maintains global geometry effectively
Enables texture editing and combinational 4D generation
Abstract
Text-to-4D generation has recently been demonstrated viable by integrating a 2D image diffusion model with a video diffusion model. However, existing models tend to produce results with inconsistent motions and geometric structures over time. To this end, we present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user-supplied prompts. The primary challenges of our mesh-based framework involve stably generating a mesh with details that align with the text prompt while directly driving it and maintaining surface continuity. Our CT4D framework incorporates a unique Generate-Refine-Animate (GRA) algorithm to enhance the creation of text-aligned meshes. To improve surface continuity, we divide a mesh into several smaller regions and implement a uniform driving function within each area. Additionally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
MethodsDiffusion · ALIGN
