Audio-driven Talking Face Video Generation with Learning-based   Personalized Head Pose

Ran Yi; Zipeng Ye; Juyong Zhang; Hujun Bao; Yong-Jin Liu

arXiv:2002.10137·cs.CV·March 6, 2020·77 cites

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

Ran Yi, Zipeng Ye, Juyong Zhang, Hujun Bao, Yong-Jin Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural network model that synthesizes realistic talking face videos with personalized head movements, expressions, and lip sync by leveraging audio, a short target video, and 3D face reconstruction.

Contribution

It presents a novel approach combining 3D face animation and a memory-augmented GAN to generate personalized head pose and expressions from minimal target video data.

Findings

01

Produces high-quality talking face videos with natural head movements

02

Requires only about 300 frames of target video for personalization

03

Outperforms state-of-the-art methods in realism and head movement naturalness

Abstract

Real-world talking faces often accompany with natural head movement. However, most existing talking face video generation methods only consider facial animation with fixed head pose. In this paper, we address this problem by proposing a deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized high-quality talking face video with personalized head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V). The most challenging issue in our work is that natural poses often cause in-plane and out-of-plane head rotations, which makes synthesized talking face video far from realistic. To address this challenge, we reconstruct 3D face animation and re-render it into synthesized frames. To fine tune these frames into realistic ones with smooth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yiranran/Audio-driven-TalkingFace-HeadPose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis

MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729