STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
Zenghao Chai, Chen Tang, Yongkang Wong, Mohan Kankanhalli

TL;DR
STAR introduces a skeleton-aware, end-to-end method for generating animated 4D avatars from text, effectively addressing pose and motion mismatches with in-network retargeting and multi-view supervision.
Contribution
It proposes a novel skeleton-aware framework that corrects motion mismatches and integrates skeleton-conditioned priors for high-quality, text-driven 4D avatar synthesis.
Findings
Produces high-quality, vivid 4D avatars aligned with text descriptions.
Effectively corrects motion and pose mismatches using in-network retargeting.
Achieves coherent multi-view and frame-consistent supervision.
Abstract
The creation of 4D avatars (i.e., animated 3D avatars) from text description typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in the canonical space and subsequently applies animation with target motions. However, such an optimization-by-animation paradigm has several drawbacks. (1) For pose-agnostic optimization, the rendered images in canonical pose for naive Score Distillation Sampling (SDS) exhibit domain gap and cannot preserve view-consistency using only T2I priors, and (2) For post hoc animation, simply applying the source motions to target 3D avatars yields translation artifacts and misalignment. To address these issues, we propose Skeleton-aware Text-based 4D Avatar generation with in-network motion Retargeting (STAR). STAR considers the geometry and skeleton differences between the template mesh and target avatar, and corrects the mismatched source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Multimodal Machine Learning Applications
MethodsALIGN · High-Order Consensuses · Diffusion
