ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D   Facial Animation Synthesis Using VQ-VAE

Sichun Wu; Kazi Injamamul Haque; Zerrin Yumak

arXiv:2409.07966·cs.CV·February 18, 2025

ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE

Sichun Wu, Kazi Injamamul Haque, Zerrin Yumak

PDF

1 Repo

TL;DR

ProbTalk3D introduces a novel non-deterministic approach for speech-driven 3D facial animation that incorporates emotional control, leveraging a two-stage VQ-VAE model and a rich emotional dataset to produce diverse, emotionally expressive animations.

Contribution

This work is the first to combine non-deterministic modeling with emotion control in 3D facial animation synthesis using VQ-VAE and a rich emotional dataset.

Findings

01

Outperforms state-of-the-art models in objective and subjective evaluations.

02

Effectively generates diverse and emotionally-rich facial animations.

03

Demonstrates the importance of non-determinism and emotion control for realistic animation.

Abstract

Audio-driven 3D facial animation synthesis has been an active field of research with attention from both academia and industry. While there are promising results in this area, recent approaches largely focus on lip-sync and identity control, neglecting the role of emotions and emotion control in the generative process. That is mainly due to the lack of emotionally rich facial animation data and algorithms that can synthesize speech animations with emotional expressions at the same time. In addition, majority of the models are deterministic, meaning given the same audio input, they produce the same output motion. We argue that emotions and non-determinism are crucial to generate diverse and emotionally-rich facial animations. In this paper, we propose ProbTalk3D a non-deterministic neural network approach for emotion controllable speech-driven 3D facial animation synthesis using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uuembodiedsocialai/probtalk3d
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · VQ-VAE · Focus