Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation
Yingruo Fan, Zhaojiang Lin, Jun Saito, Wenping Wang, Taku Komura

TL;DR
This paper introduces a joint audio-text model that leverages high-level contextual text embeddings to improve expressive speech-driven 3D facial animation, capturing diverse facial motions and expressions more realistically.
Contribution
The novel integration of pre-trained language model embeddings with audio features enhances the synthesis of expressive facial animations beyond phoneme-level approaches.
Findings
Outperforms existing state-of-the-art methods in realism and synchronization.
Effectively captures diverse upper face expressions.
Demonstrates superior results in quantitative, qualitative, and perceptual evaluations.
Abstract
Speech-driven 3D facial animation with accurate lip synchronization has been widely studied. However, synthesizing realistic motions for the entire face during speech has rarely been explored. In this work, we present a joint audio-text model to capture the contextual information for expressive speech-driven 3D facial animation. The existing datasets are collected to cover as many different phonemes as possible instead of sentences, thus limiting the capability of the audio-based model to learn more diverse contexts. To address this, we propose to leverage the contextual text embeddings extracted from the powerful pre-trained language model that has learned rich contextual representations from large-scale text data. Our hypothesis is that the text features can disambiguate the variations in upper face expressions, which are not strongly correlated with the audio. In contrast to prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Facial Nerve Paralysis Treatment and Research
