Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion
Xingpei Ma, Jiaran Cai, Yuansheng Guan, Shenneng Huang, Qiang Zhang, Shunsi Zhang

TL;DR
Playmate introduces a two-stage framework for more controllable and expressive portrait animation, enabling fine-grained emotion and pose control while maintaining high video quality and lip-sync accuracy.
Contribution
It proposes a novel two-stage training method with a 3D-implicit representation and emotion control, enhancing controllability and realism in talking face generation.
Findings
Outperforms state-of-the-art in video quality
Maintains strong lip synchronization
Offers improved emotion and pose control
Abstract
Recent diffusion-based talking face generation models have demonstrated impressive potential in synthesizing videos that accurately match a speech audio clip with a given reference identity. However, existing approaches still encounter significant challenges due to uncontrollable factors, such as inaccurate lip-sync, inappropriate head posture and the lack of fine-grained control over facial expressions. In order to introduce more face-guided conditions beyond speech audio clips, a novel two-stage training framework Playmate is proposed to generate more lifelike facial expressions and talking faces. In the first stage, we introduce a decoupled implicit 3D representation along with a meticulously designed motion-decoupled module to facilitate more accurate attribute disentanglement and generate expressive talking videos directly from audio cues. Then, in the second stage, we introduce an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Human Motion and Animation · 3D Shape Modeling and Analysis
MethodsContrastive Language-Image Pre-training
