Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
Karren D. Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja, Vemulapalli, Oncel Tuzel

TL;DR
This paper introduces a probabilistic approach to animating 3D facial geometry from speech, addressing the limitations of deterministic models by providing new datasets, metrics, and methods for diverse and faithful facial motion synthesis.
Contribution
It presents the first large-scale benchmark datasets and metrics for probabilistic 3D facial motion synthesis from speech, along with a novel model that achieves diverse and accurate results.
Findings
The probabilistic model outperforms existing methods on new benchmarks.
Generated facial motions are diverse and match unseen speaker styles.
Synthetic meshes improve downstream audio-visual model performance.
Abstract
We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D facial motions that accompany speech in the real world. Importantly, the relationship between speech and facial motion is one-to-many, containing both inter-speaker and intra-speaker variations and necessitating a probabilistic approach. In this paper, we identify and address key challenges that have so far limited the development of probabilistic models: lack of datasets and metrics that are suitable for training and evaluating them, as well as the difficulty of designing a model that generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Facial Nerve Paralysis Treatment and Research
