PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles
Tianshun Han, Benjia Zhou, Ajian Liu, Yanyan Liang, Du Zhang, Zhen Lei, Jun Wan

TL;DR
PESTalk is a new speech-driven 3D facial animation method that personalizes emotional styles by analyzing audio features and voiceprint data, outperforming existing techniques in realism and personalization.
Contribution
Introduces DSEE and ESMM modules for detailed emotion extraction and personalized style modeling in speech-driven facial animation.
Findings
Outperforms state-of-the-art methods in realism
Effectively captures personalized emotional styles
Leverages a new 3D-EmoStyle dataset
Abstract
PESTalk is a novel method for generating 3D facial animations with personalized emotional styles directly from speech. It overcomes key limitations of existing approaches by introducing a Dual-Stream Emotion Extractor (DSEE) that captures both time and frequency-domain audio features for fine-grained emotion analysis, and an Emotional Style Modeling Module (ESMM) that models individual expression patterns based on voiceprint characteristics. To address data scarcity, the method leverages a newly constructed 3D-EmoStyle dataset. Evaluations demonstrate that PESTalk outperforms state-of-the-art methods in producing realistic and personalized facial animations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Emotion and Mood Recognition · Generative Adversarial Networks and Image Synthesis
