Depth-Aware Generative Adversarial Network for Talking Head Video Generation
Fa-Ting Hong, Longhao Zhang, Li Shen, and Dan Xu

TL;DR
This paper introduces DaGAN, a depth-aware GAN that uses self-supervised dense 3D face geometry to improve talking head video generation, resulting in more realistic and accurate face synthesis.
Contribution
The paper presents a novel self-supervised method to learn dense 3D face geometry without expensive annotations and integrates it into a GAN for enhanced talking head video synthesis.
Findings
Generated videos are highly realistic.
Achieved significant improvements on unseen faces.
Effectively captures critical head movements.
Abstract
Talking head video generation aims to produce a synthetic human face video that contains the identity and pose information respectively from a given source image and a driving video.Existing works for this task heavily rely on 2D representations (e.g. appearance and motion) learned from the input images. However, dense 3D facial geometry (e.g. pixel-wise depth) is extremely important for this task as it is particularly beneficial for us to essentially generate accurate 3D face structures and distinguish noisy information from the possibly cluttered background. Nevertheless, dense 3D geometry annotations are prohibitively costly for videos and are typically not available for this video generation task. In this paper, we first introduce a self-supervised geometry learning method to automatically recover the dense 3D geometry (i.e.depth) from the face videos without the requirement of any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Vision and Imaging
