S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High   Fidelity Talking Head Synthesis

Dongze Li; Kang Zhao; Wei Wang; Yifeng Ma; Bo Peng; Yingya Zhang; Jing; Dong

arXiv:2408.09347·cs.CV·August 20, 2024

S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis

Dongze Li, Kang Zhao, Wei Wang, Yifeng Ma, Bo Peng, Yingya Zhang, Jing, Dong

PDF

Open Access

TL;DR

This paper introduces S^3D-NeRF, a novel single-shot neural radiance field approach that directly uses speech audio for high-fidelity talking head synthesis, improving lip synchronization and visual quality.

Contribution

The paper proposes a hierarchical appearance encoder, a cross-modal deformation field, and a lip-sync discriminator to enable direct speech-driven face synthesis with enhanced realism and synchronization.

Findings

01

Outperforms previous methods in video fidelity.

02

Achieves superior audio-lip synchronization.

03

Maintains temporal consistency in lip movements.

Abstract

Talking head synthesis is a practical technique with wide applications. Current Neural Radiance Field (NeRF) based approaches have shown their superiority on driving one-shot talking heads with videos or signals regressed from audio. However, most of them failed to take the audio as driven information directly, unable to enjoy the flexibility and availability of speech. Since mapping audio signals to face deformation is non-trivial, we design a Single-Shot Speech-Driven Neural Radiance Field (S^3D-NeRF) method in this paper to tackle the following three difficulties: learning a representative appearance feature for each identity, modeling motion of different face regions with audio, and keeping the temporal consistency of the lip area. To this end, we introduce a Hierarchical Facial Appearance Encoder to learn multi-scale representations for catching the appearance of different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems