RealTalk: Realistic Emotion-Aware Lifelike Talking-Head Synthesis
Wenqing Wang, Yun Fu

TL;DR
RealTalk is a new framework that synthesizes realistic emotional talking-head videos with high emotion accuracy, controllability, and identity preservation, advancing socially intelligent AI systems.
Contribution
It introduces a novel combination of VAE, landmark deformation, and tri-plane attention NeRF for emotion-aware talking-head synthesis.
Findings
Outperforms existing methods in emotion accuracy
Enhances emotion controllability in generated videos
Maintains high identity preservation
Abstract
Emotion is a critical component of artificial social intelligence. However, while current methods excel in lip synchronization and image quality, they often fail to generate accurate and controllable emotional expressions while preserving the subject's identity. To address this challenge, we introduce RealTalk, a novel framework for synthesizing emotional talking heads with high emotion accuracy, enhanced emotion controllability, and robust identity preservation. RealTalk employs a variational autoencoder (VAE) to generate 3D facial landmarks from driving audio, which are concatenated with emotion-label embeddings using a ResNet-based landmark deformation model (LDM) to produce emotional landmarks. These landmarks and facial blendshape coefficients jointly condition a novel tri-plane attention Neural Radiance Field (NeRF) to synthesize highly realistic emotional talking heads. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
