Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape
Wei Zhao, Yijun Wang, Tianyu He, Lianying Yin, Jianxin Lin, Xin Jin

TL;DR
VividTalker is a novel framework that generates realistic speech-driven 3D facial animations with natural head poses and detailed facial features by disentangling and separately modeling mouth movements and head poses using a Transformer-based approach.
Contribution
It introduces a new method for disentangling facial attributes and synthesizing detailed 3D facial shapes, addressing data scarcity and controllability issues in speech-driven animation.
Findings
Outperforms existing methods in realism and expressiveness
Successfully models natural head poses and facial details
Creates a new detailed 3D facial shape dataset
Abstract
The creation of lifelike speech-driven 3D facial animation requires a natural and precise synchronization between audio input and facial expressions. However, existing works still fail to render shapes with flexible head poses and natural facial details (e.g., wrinkles). This limitation is mainly due to two aspects: 1) Collecting training set with detailed 3D facial shapes is highly expensive. This scarcity of detailed shape annotations hinders the training of models with expressive facial animation. 2) Compared to mouth movement, the head pose is much less correlated to speech content. Consequently, concurrent modeling of both mouth movement and head pose yields the lack of facial movement controllability. To address these challenges, we introduce VividTalker, a new framework designed to facilitate speech-driven 3D facial animation characterized by flexible head pose and natural facial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Facial Nerve Paralysis Treatment and Research · Facial Rejuvenation and Surgery Techniques
MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Byte Pair Encoding · Dense Connections · Position-Wise Feed-Forward Layer
