NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via   Generative Prior

Gihoon Kim; Kwanggyoon Seo; Sihun Cha; Junyong Noh

arXiv:2405.05749·cs.CV·May 13, 2024

NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior

Gihoon Kim, Kwanggyoon Seo, Sihun Cha, Junyong Noh

PDF

Open Access

TL;DR

This paper introduces NeRFFaceSpeech, a novel method for high-quality, 3D-consistent, audio-driven talking head synthesis from a single image, leveraging generative priors and innovative techniques to address previous limitations.

Contribution

It presents a new approach combining NeRF and generative models to achieve 3D-consistent facial animation from one image, including a novel spatial synchronization and LipaintNet for inner-mouth detail.

Findings

01

Outperforms previous methods in 3D consistency and quality.

02

Introduces a quantitative robustness measure against pose variations.

03

Enables high-quality, one-shot audio-driven 3D head synthesis.

Abstract

Audio-driven talking head generation is advancing from 2D to 3D content. Notably, Neural Radiance Field (NeRF) is in the spotlight as a means to synthesize high-quality 3D talking head outputs. Unfortunately, this NeRF-based approach typically requires a large number of paired audio-visual data for each identity, thereby limiting the scalability of the method. Although there have been attempts to generate audio-driven 3D talking head animations with a single image, the results are often unsatisfactory due to insufficient information on obscured regions in the image. In this paper, we mainly focus on addressing the overlooked aspect of 3D consistency in the one-shot, audio-driven domain, where facial animations are synthesized primarily in front-facing perspectives. We propose a novel method, NeRFFaceSpeech, which enables to produce high-quality 3D-aware talking head. Using prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsFocus