ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting

Chuhang Ma; Shuai Tan; Ye Pan; Jiaolong Yang; Xin Tong

arXiv:2601.01847·cs.CV·January 22, 2026

ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting

Chuhang Ma, Shuai Tan, Ye Pan, Jiaolong Yang, Xin Tong

PDF

Open Access

TL;DR

ESGaussianFace introduces a novel 3D Gaussian splatting framework for efficient, high-quality, emotionally expressive, and stylized audio-driven facial animation with 3D consistency, outperforming existing methods.

Contribution

The paper presents a new framework combining 3D Gaussian splatting, emotion-guided spatial attention, and multi-stage training for realistic emotional and stylized facial animation from audio.

Findings

01

Outperforms state-of-the-art in lip accuracy and expression variation

02

Achieves high efficiency and 3D consistency in facial animation

03

Effectively integrates emotion and style features for realistic results

Abstract

Most current audio-driven facial animation research primarily focuses on generating videos with neutral emotions. While some studies have addressed the generation of facial videos driven by emotional audio, efficiently generating high-quality talking head videos that integrate both emotional expressions and style features remains a significant challenge. In this paper, we propose ESGaussianFace, an innovative framework for emotional and stylized audio-driven facial animation. Our approach leverages 3D Gaussian Splatting to reconstruct 3D scenes and render videos, ensuring efficient generation of 3D consistent results. We propose an emotion-audio-guided spatial attention method that effectively integrates emotion features with audio content features. Through emotion-guided attention, the model is able to reconstruct facial details across different emotional states more accurately. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing