Facial Keypoint Sequence Generation from Audio

Prateek Manocha; Prithwijit Guha

arXiv:2011.01114·cs.CV·November 3, 2020

Facial Keypoint Sequence Generation from Audio

Prateek Manocha, Prithwijit Guha

PDF

Open Access

TL;DR

This paper introduces a novel dataset and model for generating facial keypoint movements from audio, enabling realistic talking face animations with head movements and identity preservation, even for unseen individuals and arbitrary audio lengths.

Contribution

It presents the first audio-keypoint dataset and a model that generates synchronized facial keypoints from audio, accommodating head movements and unseen identities.

Findings

01

The dataset contains over 150,000 videos at 224p and 25fps.

02

The Audio2Keypoint model effectively generates plausible facial keypoint sequences.

03

The approach generalizes to unseen people and arbitrary audio lengths.

Abstract

Whenever we speak, our voice is accompanied by facial movements and expressions. Several recent works have shown the synthesis of highly photo-realistic videos of talking faces, but they either require a source video to drive the target face or only generate videos with a fixed head pose. This lack of facial movement is because most of these works focus on the lip movement in sync with the audio while assuming the remaining facial keypoints' fixed nature. To address this, a unique audio-keypoint dataset of over 150,000 videos at 224p and 25fps is introduced that relates the facial keypoint movement for the given audio. This dataset is then further used to train the model, Audio2Keypoint, a novel approach for synthesizing facial keypoint movement to go with the audio. Given a single image of the target person and an audio sequence (in any language), Audio2Keypoint generates a plausible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis