Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation
Nanhan Shen, Zhilei Liu

TL;DR
This paper introduces UA-3DTalk, a novel 3D emotional talking face synthesis method that effectively aligns audio and emotion, controls micro-expressions, and adaptively fuses multi-view data by modeling uncertainty.
Contribution
It proposes a comprehensive framework with modules for emotion prior distillation, multi-modal emotion fusion, and uncertainty-based view adaptation, advancing the realism and controllability of 3D emotional talking face synthesis.
Findings
Outperforms state-of-the-art methods in emotion alignment and lip synchronization.
Achieves 5.2% improvement in E-FID over competitors.
Enhances rendering quality with a 0.015 LPIPS score.
Abstract
Emotional Talking Face synthesis is pivotal in multimedia and signal processing, yet existing 3D methods suffer from two critical challenges: poor audio-vision emotion alignment, manifested as difficult audio emotion extraction and inadequate control over emotional micro-expressions; and a one-size-fits-all multi-view fusion strategy that overlooks uncertainty and feature quality differences, undermining rendering quality. We propose UA-3DTalk, Uncertainty-Aware 3D Emotional Talking Face Synthesis with emotion prior distillation, which has three core modules: the Prior Extraction module disentangles audio into content-synchronized features for alignment and person-specific complementary features for individualization; the Emotion Distillation module introduces a multi-modal attention-weighted fusion mechanism and 4D Gaussian encoding with multi-resolution code-books, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Emotion and Mood Recognition
