STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network   Motion Retargeting

Zenghao Chai; Chen Tang; Yongkang Wong; Mohan Kankanhalli

arXiv:2406.04629·cs.CV·June 10, 2024

STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting

Zenghao Chai, Chen Tang, Yongkang Wong, Mohan Kankanhalli

PDF

Open Access 1 Repo

TL;DR

STAR introduces a skeleton-aware, end-to-end method for generating animated 4D avatars from text, effectively addressing pose and motion mismatches with in-network retargeting and multi-view supervision.

Contribution

It proposes a novel skeleton-aware framework that corrects motion mismatches and integrates skeleton-conditioned priors for high-quality, text-driven 4D avatar synthesis.

Findings

01

Produces high-quality, vivid 4D avatars aligned with text descriptions.

02

Effectively corrects motion and pose mismatches using in-network retargeting.

03

Achieves coherent multi-view and frame-consistent supervision.

Abstract

The creation of 4D avatars (i.e., animated 3D avatars) from text description typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in the canonical space and subsequently applies animation with target motions. However, such an optimization-by-animation paradigm has several drawbacks. (1) For pose-agnostic optimization, the rendered images in canonical pose for naive Score Distillation Sampling (SDS) exhibit domain gap and cannot preserve view-consistency using only T2I priors, and (2) For post hoc animation, simply applying the source motions to target 3D avatars yields translation artifacts and misalignment. To address these issues, we propose Skeleton-aware Text-based 4D Avatar generation with in-network motion Retargeting (STAR). STAR considers the geometry and skeleton differences between the template mesh and target avatar, and corrects the mismatched source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

czh-98/STAR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Video Analysis and Summarization · Multimodal Machine Learning Applications

MethodsALIGN · High-Order Consensuses · Diffusion