DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D   Face Animation

Jisoo Kim; Jungbin Cho; Joonho Park; Soonmin Hwang; Da Eun Kim; Geon; Kim; Youngjae Yu

arXiv:2408.06010·cs.CV·March 25, 2025

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

Jisoo Kim, Jungbin Cho, Joonho Park, Soonmin Hwang, Da Eun Kim, Geon, Kim, Youngjae Yu

PDF

Open Access 1 Video

TL;DR

DEEPTalk is a novel speech-driven 3D facial animation method that generates emotionally rich, diverse, and realistic facial expressions by leveraging probabilistic emotion embeddings and hierarchical motion priors, improving over monotonous previous approaches.

Contribution

The paper introduces DEEPTalk, combining probabilistic contrastive learning for emotion embedding with hierarchical VQ-VAE for dynamic facial motion, enabling emotionally expressive and diverse 3D face animations from speech.

Findings

01

Produces diverse, emotionally rich facial animations

02

Maintains accurate lip-sync in generated faces

03

Outperforms existing methods in realism and expressiveness

Abstract

Speech-driven 3D facial animation has garnered lots of attention thanks to its broad range of applications. Despite recent advancements in achieving realistic lip motion, current methods fail to capture the nuanced emotional undertones conveyed through speech and produce monotonous facial motion. These limitations result in blunt and repetitive facial animations, reducing user engagement and hindering their applicability. To address these challenges, we introduce DEEPTalk, a novel approach that generates diverse and emotionally rich 3D facial expressions directly from speech inputs. To achieve this, we first train DEE (Dynamic Emotion Embedding), which employs probabilistic contrastive learning to forge a joint emotion embedding space for both speech and facial motion. This probabilistic framework captures the uncertainty in interpreting emotions from speech and facial motion, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation· underline

Taxonomy

TopicsFace recognition and analysis · Human Motion and Animation · Face and Expression Recognition

MethodsSoftmax · Attention Is All You Need · Contrastive Learning