Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose
Ran Yi, Zipeng Ye, Juyong Zhang, Hujun Bao, Yong-Jin Liu

TL;DR
This paper introduces a neural network model that synthesizes realistic talking face videos with personalized head movements, expressions, and lip sync by leveraging audio, a short target video, and 3D face reconstruction.
Contribution
It presents a novel approach combining 3D face animation and a memory-augmented GAN to generate personalized head pose and expressions from minimal target video data.
Findings
Produces high-quality talking face videos with natural head movements
Requires only about 300 frames of target video for personalization
Outperforms state-of-the-art methods in realism and head movement naturalness
Abstract
Real-world talking faces often accompany with natural head movement. However, most existing talking face video generation methods only consider facial animation with fixed head pose. In this paper, we address this problem by proposing a deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized high-quality talking face video with personalized head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V). The most challenging issue in our work is that natural poses often cause in-plane and out-of-plane head rotations, which makes synthesized talking face video far from realistic. To address this challenge, we reconstruct 3D face animation and re-render it into synthesized frames. To fine tune these frames into realistic ones with smooth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729
