MegActor: Harness the Power of Raw Video for Vivid Portrait Animation
Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji,, Jiajun Liang, Haoqiang Fan

TL;DR
MegActor is a novel diffusion-based portrait animation method that effectively uses raw videos by addressing identity leakage and background interference through synthetic data, segmentation, and style transfer.
Contribution
Introduces MegActor, a pioneering conditional diffusion model that leverages raw videos for portrait animation by mitigating identity leakage and background issues.
Findings
Achieves results comparable to commercial models using only public datasets.
Effectively mitigates identity leakage with synthetic data generation.
Maintains background stability through CLIP-based encoding.
Abstract
Despite raw driving videos contain richer information on facial expressions than intermediate representations such as landmarks in the field of portrait animation, they are seldom the subject of research. This is due to two challenges inherent in portrait animation driven with raw videos: 1) significant identity leakage; 2) Irrelevant background and facial details such as wrinkles degrade performance. To harnesses the power of the raw videos for vivid portrait animation, we proposed a pioneering conditional diffusion model named as MegActor. First, we introduced a synthetic data generation framework for creating videos with consistent motion and expressions but inconsistent IDs to mitigate the issue of ID leakage. Second, we segmented the foreground and background of the reference image and employed CLIP to encode the background details. This encoded information is then integrated into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation
MethodsContrastive Language-Image Pre-training · Diffusion
