Breathing Life into Faces: Speech-driven 3D Facial Animation with   Natural Head Pose and Detailed Shape

Wei Zhao; Yijun Wang; Tianyu He; Lianying Yin; Jianxin Lin; Xin Jin

arXiv:2310.20240·cs.CV·November 1, 2023·1 cites

Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape

Wei Zhao, Yijun Wang, Tianyu He, Lianying Yin, Jianxin Lin, Xin Jin

PDF

Open Access

TL;DR

VividTalker is a novel framework that generates realistic speech-driven 3D facial animations with natural head poses and detailed facial features by disentangling and separately modeling mouth movements and head poses using a Transformer-based approach.

Contribution

It introduces a new method for disentangling facial attributes and synthesizing detailed 3D facial shapes, addressing data scarcity and controllability issues in speech-driven animation.

Findings

01

Outperforms existing methods in realism and expressiveness

02

Successfully models natural head poses and facial details

03

Creates a new detailed 3D facial shape dataset

Abstract

The creation of lifelike speech-driven 3D facial animation requires a natural and precise synchronization between audio input and facial expressions. However, existing works still fail to render shapes with flexible head poses and natural facial details (e.g., wrinkles). This limitation is mainly due to two aspects: 1) Collecting training set with detailed 3D facial shapes is highly expensive. This scarcity of detailed shape annotations hinders the training of models with expressive facial animation. 2) Compared to mouth movement, the head pose is much less correlated to speech content. Consequently, concurrent modeling of both mouth movement and head pose yields the lack of facial movement controllability. To address these challenges, we introduce VividTalker, a new framework designed to facilitate speech-driven 3D facial animation characterized by flexible head pose and natural facial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Facial Nerve Paralysis Treatment and Research · Facial Rejuvenation and Surgery Techniques

MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Byte Pair Encoding · Dense Connections · Position-Wise Feed-Forward Layer