EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation

Yihong Lin; Liang Peng; Zhaoxin Fan; Xianjia Wu; Jianqiao Hu; Xiandong Li; Wenxiong Kang; Songju Lei

arXiv:2408.11518·cs.CV·May 19, 2025

EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation

Yihong Lin, Liang Peng, Zhaoxin Fan, Xianjia Wu, Jianqiao Hu, Xiandong Li, Wenxiong Kang, Songju Lei

PDF

Open Access

TL;DR

EmoFace is a novel two-stream network that disentangles emotion and content in speech-driven 3D facial animation, using Mesh Attention and a self-growing training scheme to improve realism and emotional expression.

Contribution

The paper introduces EmoFace, the first to incorporate emotion-content disentanglement with Mesh Attention and a self-growing training scheme for enhanced 3D facial animation.

Findings

01

Achieves state-of-the-art performance on 3D-RAVDESS and VOCASET datasets.

02

Effectively captures emotional expressions and lip synchronization.

03

Outperforms existing methods in quantitative and qualitative evaluations.

Abstract

The creation of increasingly vivid 3D talking face has become a hot topic in recent years. Currently, most speech-driven works focus on lip synchronisation but neglect to effectively capture the correlations between emotions and facial motions. To address this problem, we propose a two-stream network called EmoFace, which consists of an emotion branch and a content branch. EmoFace employs a novel Mesh Attention mechanism to analyse and fuse the emotion features and content features. Particularly, a newly designed spatio-temporal graph-based convolution, SpiralConv3D, is used in Mesh Attention to learn potential temporal and spatial feature dependencies between mesh vertices. In addition, to the best of our knowledge, it is the first time to introduce a new self-growing training scheme with intermediate supervision to dynamically adjust the ratio of groundtruth adopted in the 3D face…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Facial Nerve Paralysis Treatment and Research · Speech and Audio Processing

MethodsSoftmax · Attention Is All You Need · Focus