TL;DR
ProbTalk3D introduces a novel non-deterministic approach for speech-driven 3D facial animation that incorporates emotional control, leveraging a two-stage VQ-VAE model and a rich emotional dataset to produce diverse, emotionally expressive animations.
Contribution
This work is the first to combine non-deterministic modeling with emotion control in 3D facial animation synthesis using VQ-VAE and a rich emotional dataset.
Findings
Outperforms state-of-the-art models in objective and subjective evaluations.
Effectively generates diverse and emotionally-rich facial animations.
Demonstrates the importance of non-determinism and emotion control for realistic animation.
Abstract
Audio-driven 3D facial animation synthesis has been an active field of research with attention from both academia and industry. While there are promising results in this area, recent approaches largely focus on lip-sync and identity control, neglecting the role of emotions and emotion control in the generative process. That is mainly due to the lack of emotionally rich facial animation data and algorithms that can synthesize speech animations with emotional expressions at the same time. In addition, majority of the models are deterministic, meaning given the same audio input, they produce the same output motion. We argue that emotions and non-determinism are crucial to generate diverse and emotionally-rich facial animations. In this paper, we propose ProbTalk3D a non-deterministic neural network approach for emotion controllable speech-driven 3D facial animation synthesis using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · VQ-VAE · Focus
