EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis
Junuk Cha, Seongro Yoon, Valeriya Strizhkova, Francois Bremond,, Seungryul Baek

TL;DR
EmoTalkingGaussian is a novel 3D Gaussian splatting-based talking head synthesis model that can generate emotionally expressive, lip-synchronized talking heads conditioned on continuous emotion values, trained with self-supervised methods for in-the-wild audio.
Contribution
It introduces a lip-aligned emotional face generator and a self-supervised training approach for emotion-conditioned talking head synthesis with improved realism and synchronization.
Findings
Outperforms state-of-the-art in image quality metrics.
Achieves better emotion expression accuracy.
Enhances lip synchronization in wild audio scenarios.
Abstract
3D Gaussian splatting-based talking head synthesis has recently gained attention for its ability to render high-fidelity images with real-time inference speed. However, since it is typically trained on only a short video that lacks the diversity in facial emotions, the resultant talking heads struggle to represent a wide range of emotions. To address this issue, we propose a lip-aligned emotional face generator and leverage it to train our EmoTalkingGaussian model. It is able to manipulate facial emotions conditioned on continuous emotion values (i.e., valence and arousal); while retaining synchronization of lip movements with input audio. Additionally, to achieve the accurate lip synchronization for in-the-wild audio, we introduce a self-supervised learning method that leverages a text-to-speech network and a visual-audio synchronization network. We experiment our EmoTalkingGaussian on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Robotics and Automated Systems · Face recognition and analysis
MethodsSoftmax · Attention Is All You Need
