Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis
Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie Zhou, Jiwen Lu

TL;DR
This paper introduces Dynamic Facial Radiance Fields (DFRF), a novel method enabling few-shot, high-quality talking head synthesis by conditioning on appearance images and audio-driven face deformation, significantly reducing training data requirements.
Contribution
The paper proposes DFRF, which generalizes to unseen identities with minimal data by conditioning on appearance images and using a differentiable face warping module based on audio signals.
Findings
DFRF synthesizes natural, high-quality talking head videos with only tens of seconds of data.
It requires only 40,000 training iterations for new identities.
DFRF outperforms existing methods in few-shot talking head synthesis.
Abstract
Talking head synthesis is an emerging technology with wide applications in film dubbing, virtual avatars and online education. Recent NeRF-based methods generate more natural talking videos, as they better capture the 3D structural information of faces. However, a specific model needs to be trained for each identity with a large dataset. In this paper, we propose Dynamic Facial Radiance Fields (DFRF) for few-shot talking head synthesis, which can rapidly generalize to an unseen identity with few training data. Different from the existing NeRF-based methods which directly encode the 3D geometry and appearance of a specific person into the network, our DFRF conditions face radiance field on 2D appearance images to learn the face prior. Thus the facial radiance field can be flexibly adjusted to the new identity with few reference images. Additionally, for better modeling of the facial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
MethodsContrastive Language-Image Pre-training
