MagicAvatar: Multimodal Avatar Generation and Animation
Jianfeng Zhang, Hanshu Yan, Zhongcong Xu, Jiashi Feng, Jun, Hao Liew

TL;DR
MagicAvatar introduces a two-stage framework for multimodal avatar video generation and animation, disentangling motion control from video synthesis to enhance flexibility and enable avatar animation from images or multimodal inputs.
Contribution
It proposes a novel two-stage approach for multimodal avatar generation, explicitly separating motion control from video synthesis, and supports avatar animation from images.
Findings
Effective multimodal-to-motion translation demonstrated.
High-quality avatar video generation achieved.
Flexible avatar animation from images shown.
Abstract
This report presents MagicAvatar, a framework for multimodal video generation and animation of human avatars. Unlike most existing methods that generate avatar-centric videos directly from multimodal inputs (e.g., text prompts), MagicAvatar explicitly disentangles avatar video generation into two stages: (1) multimodal-to-motion and (2) motion-to-video generation. The first stage translates the multimodal inputs into motion/ control signals (e.g., human pose, depth, DensePose); while the second stage generates avatar-centric video guided by these motion signals. Additionally, MagicAvatar supports avatar animation by simply providing a few images of the target person. This capability enables the animation of the provided human identity according to the specific motion derived from the first stage. We demonstrate the flexibility of MagicAvatar through various applications, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Multimodal Machine Learning Applications
