MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

Shurong Yang; Huadong Li; Juhao Wu; Minhao Jing; Linze Li; Renhe Ji,; Jiajun Liang; Haoqiang Fan

arXiv:2405.20851·cs.CV·June 19, 2024

MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji,, Jiajun Liang, Haoqiang Fan

PDF

Open Access 2 Repos

TL;DR

MegActor is a novel diffusion-based portrait animation method that effectively uses raw videos by addressing identity leakage and background interference through synthetic data, segmentation, and style transfer.

Contribution

Introduces MegActor, a pioneering conditional diffusion model that leverages raw videos for portrait animation by mitigating identity leakage and background issues.

Findings

01

Achieves results comparable to commercial models using only public datasets.

02

Effectively mitigates identity leakage with synthetic data generation.

03

Maintains background stability through CLIP-based encoding.

Abstract

Despite raw driving videos contain richer information on facial expressions than intermediate representations such as landmarks in the field of portrait animation, they are seldom the subject of research. This is due to two challenges inherent in portrait animation driven with raw videos: 1) significant identity leakage; 2) Irrelevant background and facial details such as wrinkles degrade performance. To harnesses the power of the raw videos for vivid portrait animation, we proposed a pioneering conditional diffusion model named as MegActor. First, we introduced a synthetic data generation framework for creating videos with consistent motion and expressions but inconsistent IDs to mitigate the issue of ID leakage. Second, we segmented the foreground and background of the reference image and employed CLIP to encode the background details. This encoded information is then integrated into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation

MethodsContrastive Language-Image Pre-training · Diffusion