PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles

Tianshun Han; Benjia Zhou; Ajian Liu; Yanyan Liang; Du Zhang; Zhen Lei; Jun Wan

arXiv:2512.05121·cs.GR·December 8, 2025

PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles

Tianshun Han, Benjia Zhou, Ajian Liu, Yanyan Liang, Du Zhang, Zhen Lei, Jun Wan

PDF

Open Access

TL;DR

PESTalk is a new speech-driven 3D facial animation method that personalizes emotional styles by analyzing audio features and voiceprint data, outperforming existing techniques in realism and personalization.

Contribution

Introduces DSEE and ESMM modules for detailed emotion extraction and personalized style modeling in speech-driven facial animation.

Findings

01

Outperforms state-of-the-art methods in realism

02

Effectively captures personalized emotional styles

03

Leverages a new 3D-EmoStyle dataset

Abstract

PESTalk is a novel method for generating 3D facial animations with personalized emotional styles directly from speech. It overcomes key limitations of existing approaches by introducing a Dual-Stream Emotion Extractor (DSEE) that captures both time and frequency-domain audio features for fine-grained emotion analysis, and an Emotional Style Modeling Module (ESMM) that models individual expression patterns based on voiceprint characteristics. To address data scarcity, the method leverages a newly constructed 3D-EmoStyle dataset. Evaluations demonstrate that PESTalk outperforms state-of-the-art methods in producing realistic and personalized facial animations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Emotion and Mood Recognition · Generative Adversarial Networks and Image Synthesis