NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation
Niu Guanchen

TL;DR
This paper introduces NLDF, a novel neural light dynamic fields model that significantly accelerates 3D talking head generation while maintaining high visual quality, by representing light fields with light segments and employing a deep learning approach.
Contribution
The paper proposes a new light field representation and training strategy that speeds up 3D talking head generation by about 30 times compared to NeRF-based methods.
Findings
Achieves 30x faster rendering speed than NeRF-based methods.
Maintains comparable visual quality in 3D talking head generation.
Effectively models facial light dynamics in 3D videos.
Abstract
Talking head generation based on the neural radiation fields model has shown promising visual effects. However, the slow rendering speed of NeRF seriously limits its application, due to the burdensome calculation process over hundreds of sampled points to synthesize one pixel. In this work, a novel Neural Light Dynamic Fields model is proposed aiming to achieve generating high quality 3D talking face with significant speedup. The NLDF represents light fields based on light segments, and a deep network is used to learn the entire light beam's information at once. In learning the knowledge distillation is applied and the NeRF based synthesized result is used to guide the correct coloration of light segments in NLDF. Furthermore, a novel active pool training strategy is proposed to focus on high frequency movements, particularly on the speaker mouth and eyebrows. The propose method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Human Motion and Animation
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus · Knowledge Distillation
