EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
Qianyun He, Xinya Ji, Yicheng Gong, Yuanxun Lu, Zhengyu, Diao, Linjia Huang, Yao Yao, Siyu Zhu, Zhan Ma, Songcen Xu and, Xiaofei Wu, Zixiao Zhang, Xun Cao, Hao Zhu

TL;DR
EmoTalk3D introduces a novel framework for synthesizing high-fidelity, emotion-controllable 3D talking heads with enhanced lip synchronization and multi-view consistency, leveraging a new dataset and a geometry-appearance mapping approach.
Contribution
The paper presents EmoTalk3D, a new dataset and a 'Speech-to-Geometry-to-Appearance' framework for realistic, emotion-aware 3D talking head synthesis from audio.
Findings
Improved lip synchronization and rendering quality.
Effective emotion control in 3D talking head generation.
High-fidelity dynamic facial detail capture.
Abstract
We present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect EmoTalk3D dataset with calibrated multi-view videos, emotional annotations, and per-frame 3D geometry. By training on the EmoTalk3D dataset, we propose a \textit{`Speech-to-Geometry-to-Appearance'} mapping framework that first predicts faithful 3D geometry sequence from the audio features, then the appearance of a 3D talking head represented by 4D Gaussians is synthesized from the predicted geometry. The appearance is further disentangled into canonical and dynamic Gaussians, learned from multi-view videos, and fused to render free-view talking head animation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Hand Gesture Recognition Systems · Human Pose and Action Recognition
