TriDiff-4D: Fast 4D Generation through Diffusion-based Triplane Re-posing
Eddie Pokming Sheung, Qihao Liu, Wufei Ma, Prakhar Kaushik, Jianwen Xie, Alan Yuille

TL;DR
TriDiff-4D introduces a diffusion-based pipeline for fast, high-quality, and temporally coherent 4D avatar generation from text, leveraging explicit 3D structure and motion priors for improved realism and efficiency.
Contribution
The paper presents a novel diffusion-based 4D generation method that eliminates optimization, enabling rapid, controllable, and high-fidelity 4D avatar synthesis from text prompts.
Findings
Significantly reduces generation time from hours to seconds.
Achieves superior temporal consistency and motion accuracy.
Produces high-fidelity 3D avatars with detailed geometry.
Abstract
With the increasing demand for 3D animation, generating high-fidelity, controllable 4D avatars from textual descriptions remains a significant challenge. Despite notable efforts in 4D generative modeling, existing methods exhibit fundamental limitations that impede their broader applicability, including temporal and geometric inconsistencies, perceptual artifacts, motion irregularities, high computational costs, and limited control over dynamics. To address these challenges, we propose TriDiff-4D, a novel 4D generative pipeline that employs diffusion-based triplane re-posing to produce high-quality, temporally coherent 4D avatars. Our model adopts an auto-regressive strategy to generate 4D sequences of arbitrary length, synthesizing each 3D frame with a single diffusion process. By explicitly learning 3D structure and motion priors from large-scale 3D and motion datasets, TriDiff-4D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
