Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation

Nanhan Shen; Zhilei Liu

arXiv:2601.19112·cs.AI·January 28, 2026

Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation

Nanhan Shen, Zhilei Liu

PDF

Open Access

TL;DR

This paper introduces UA-3DTalk, a novel 3D emotional talking face synthesis method that effectively aligns audio and emotion, controls micro-expressions, and adaptively fuses multi-view data by modeling uncertainty.

Contribution

It proposes a comprehensive framework with modules for emotion prior distillation, multi-modal emotion fusion, and uncertainty-based view adaptation, advancing the realism and controllability of 3D emotional talking face synthesis.

Findings

01

Outperforms state-of-the-art methods in emotion alignment and lip synchronization.

02

Achieves 5.2% improvement in E-FID over competitors.

03

Enhances rendering quality with a 0.015 LPIPS score.

Abstract

Emotional Talking Face synthesis is pivotal in multimedia and signal processing, yet existing 3D methods suffer from two critical challenges: poor audio-vision emotion alignment, manifested as difficult audio emotion extraction and inadequate control over emotional micro-expressions; and a one-size-fits-all multi-view fusion strategy that overlooks uncertainty and feature quality differences, undermining rendering quality. We propose UA-3DTalk, Uncertainty-Aware 3D Emotional Talking Face Synthesis with emotion prior distillation, which has three core modules: the Prior Extraction module disentangles audio into content-synchronized features for alignment and person-specific complementary features for individualization; the Emotion Distillation module introduces a multi-modal attention-weighted fusion mechanism and 4D Gaussian encoding with multi-resolution code-books, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Emotion and Mood Recognition