DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
Jisoo Kim, Jungbin Cho, Joonho Park, Soonmin Hwang, Da Eun Kim, Geon, Kim, Youngjae Yu

TL;DR
DEEPTalk is a novel speech-driven 3D facial animation method that generates emotionally rich, diverse, and realistic facial expressions by leveraging probabilistic emotion embeddings and hierarchical motion priors, improving over monotonous previous approaches.
Contribution
The paper introduces DEEPTalk, combining probabilistic contrastive learning for emotion embedding with hierarchical VQ-VAE for dynamic facial motion, enabling emotionally expressive and diverse 3D face animations from speech.
Findings
Produces diverse, emotionally rich facial animations
Maintains accurate lip-sync in generated faces
Outperforms existing methods in realism and expressiveness
Abstract
Speech-driven 3D facial animation has garnered lots of attention thanks to its broad range of applications. Despite recent advancements in achieving realistic lip motion, current methods fail to capture the nuanced emotional undertones conveyed through speech and produce monotonous facial motion. These limitations result in blunt and repetitive facial animations, reducing user engagement and hindering their applicability. To address these challenges, we introduce DEEPTalk, a novel approach that generates diverse and emotionally rich 3D facial expressions directly from speech inputs. To achieve this, we first train DEE (Dynamic Emotion Embedding), which employs probabilistic contrastive learning to forge a joint emotion embedding space for both speech and facial motion. This probabilistic framework captures the uncertainty in interpreting emotions from speech and facial motion, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace recognition and analysis · Human Motion and Animation · Face and Expression Recognition
MethodsSoftmax · Attention Is All You Need · Contrastive Learning
