Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion

Xingpei Ma; Jiaran Cai; Yuansheng Guan; Shenneng Huang; Qiang Zhang; Shunsi Zhang

arXiv:2502.07203·cs.CV·October 16, 2025

Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion

Xingpei Ma, Jiaran Cai, Yuansheng Guan, Shenneng Huang, Qiang Zhang, Shunsi Zhang

PDF

Open Access

TL;DR

Playmate introduces a two-stage framework for more controllable and expressive portrait animation, enabling fine-grained emotion and pose control while maintaining high video quality and lip-sync accuracy.

Contribution

It proposes a novel two-stage training method with a 3D-implicit representation and emotion control, enhancing controllability and realism in talking face generation.

Findings

01

Outperforms state-of-the-art in video quality

02

Maintains strong lip synchronization

03

Offers improved emotion and pose control

Abstract

Recent diffusion-based talking face generation models have demonstrated impressive potential in synthesizing videos that accurately match a speech audio clip with a given reference identity. However, existing approaches still encounter significant challenges due to uncontrollable factors, such as inaccurate lip-sync, inappropriate head posture and the lack of fine-grained control over facial expressions. In order to introduce more face-guided conditions beyond speech audio clips, a novel two-stage training framework Playmate is proposed to generate more lifelike facial expressions and talking faces. In the first stage, we introduce a decoupled implicit 3D representation along with a meticulously designed motion-decoupled module to facilitate more accurate attribute disentanglement and generate expressive talking videos directly from audio cues. Then, in the second stage, we introduce an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Human Motion and Animation · 3D Shape Modeling and Analysis

MethodsContrastive Language-Image Pre-training