Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D   Generative Prior

Jaehoon Ko; Kyusun Cho; Joungbin Lee; Heeji Yoon; Sangmin Lee; Sangjun; Ahn; Seungryong Kim

arXiv:2403.20153·cs.CV·April 1, 2024·1 cites

Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior

Jaehoon Ko, Kyusun Cho, Joungbin Lee, Heeji Yoon, Sangmin Lee, Sangjun, Ahn, Seungryong Kim

PDF

Open Access 1 Repo

TL;DR

Talk3D introduces a novel framework for high-fidelity, audio-driven talking head synthesis that accurately reconstructs facial geometry using a personalized 3D generative prior, outperforming existing methods especially in extreme poses.

Contribution

The paper proposes a new audio-guided attention U-Net architecture that leverages a pre-trained 3D generative prior to improve facial geometry reconstruction in talking head synthesis.

Findings

01

Outperforms state-of-the-art benchmarks in realism and accuracy.

02

Excels in generating facial geometries under extreme head poses.

03

Effectively disentangles audio-related and unrelated facial variations.

Abstract

Recent methods for audio-driven talking head synthesis often optimize neural radiance fields (NeRF) on a monocular talking portrait video, leveraging its capability to render high-fidelity and 3D-consistent novel-view frames. However, they often struggle to reconstruct complete face geometry due to the absence of comprehensive 3D information in the input monocular videos. In this paper, we introduce a novel audio-driven talking head synthesis framework, called Talk3D, that can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior. Given the personalized 3D generative model, we present a novel audio-guided attention U-Net architecture that predicts the dynamic face variations in the NeRF space driven by audio. Furthermore, our model is further modulated by audio-unrelated conditioning tokens which effectively disentangle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KU-CVLAB/Talk3D
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Image Retrieval and Classification Techniques · Video Analysis and Summarization

MethodsConvolution · Max Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · U-Net