NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis
Xiaoxing Liu, Zhilei Liu, Chongke Bi

TL;DR
NeRF-3DTalker introduces a novel approach combining 3D priors and audio disentanglement within neural radiance fields to synthesize realistic, multi-view talking head videos with improved lip-sync accuracy.
Contribution
It proposes a 3D prior aided audio disentanglement module and a local-global standardized space to enhance view consistency and lip-sync in talking head synthesis.
Findings
Outperforms state-of-the-art methods in realism and lip-sync quality
Achieves superior multi-view consistency in synthesized videos
Demonstrates significant improvements in image quality and synchronization
Abstract
Talking head synthesis is to synthesize a lip-synchronized talking head video using audio. Recently, the capability of NeRF to enhance the realism and texture details of synthesized talking heads has attracted the attention of researchers. However, most current NeRF methods based on audio are exclusively concerned with the rendering of frontal faces. These methods are unable to generate clear talking heads in novel views. Another prevalent challenge in current 3D talking head synthesis is the difficulty in aligning acoustic and visual spaces, which often results in suboptimal lip-syncing of the generated talking heads. To address these issues, we propose Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis (NeRF-3DTalker). Specifically, the proposed method employs 3D prior information to synthesize clear talking heads with free views. Additionally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
MethodsSoftmax · Attention Is All You Need
