MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding

Chang Liu; Ye Pan; Chenyang Ding; Susanto Rahardja; Xiaokang Yang

arXiv:2507.06071·cs.CV·August 15, 2025

MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding

Chang Liu, Ye Pan, Chenyang Ding, Susanto Rahardja, Xiaokang Yang

PDF

TL;DR

MEDTalk is a novel framework for dynamic, fine-grained emotional 3D facial animation that disentangles content and emotion, integrating multimodal inputs for realistic and controllable talking head generation.

Contribution

It introduces a disentangled embedding approach for independent control of lip movements and facial expressions, incorporating multimodal inputs for personalized, realistic emotional talking head synthesis.

Findings

01

Achieves synchronized lip movements with vivid emotional expressions.

02

Enables control over facial expressions using text and reference images.

03

Supports integration into industrial production pipelines.

Abstract

Audio-driven emotional 3D facial animation aims to generate synchronized lip movements and vivid facial expressions. However, most existing approaches focus on static and predefined emotion labels, limiting their diversity and naturalness. To address these challenges, we propose MEDTalk, a novel framework for fine-grained and dynamic emotional talking head generation. Our approach first disentangles content and emotion embedding spaces from motion sequences using a carefully designed cross-reconstruction process, enabling independent control over lip movements and facial expressions. Beyond conventional audio-driven lip synchronization, we integrate audio and speech text, predicting frame-wise intensity variations and dynamically adjusting static emotion features to generate realistic emotional expressions. Furthermore, to enhance control and personalization, we incorporate multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus