Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior
Jaehoon Ko, Kyusun Cho, Joungbin Lee, Heeji Yoon, Sangmin Lee, Sangjun, Ahn, Seungryong Kim

TL;DR
Talk3D introduces a novel framework for high-fidelity, audio-driven talking head synthesis that accurately reconstructs facial geometry using a personalized 3D generative prior, outperforming existing methods especially in extreme poses.
Contribution
The paper proposes a new audio-guided attention U-Net architecture that leverages a pre-trained 3D generative prior to improve facial geometry reconstruction in talking head synthesis.
Findings
Outperforms state-of-the-art benchmarks in realism and accuracy.
Excels in generating facial geometries under extreme head poses.
Effectively disentangles audio-related and unrelated facial variations.
Abstract
Recent methods for audio-driven talking head synthesis often optimize neural radiance fields (NeRF) on a monocular talking portrait video, leveraging its capability to render high-fidelity and 3D-consistent novel-view frames. However, they often struggle to reconstruct complete face geometry due to the absence of comprehensive 3D information in the input monocular videos. In this paper, we introduce a novel audio-driven talking head synthesis framework, called Talk3D, that can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior. Given the personalized 3D generative model, we present a novel audio-guided attention U-Net architecture that predicts the dynamic face variations in the NeRF space driven by audio. Furthermore, our model is further modulated by audio-unrelated conditioning tokens which effectively disentangle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Image Retrieval and Classification Techniques · Video Analysis and Summarization
MethodsConvolution · Max Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · U-Net
