Facial Keypoint Sequence Generation from Audio
Prateek Manocha, Prithwijit Guha

TL;DR
This paper introduces a novel dataset and model for generating facial keypoint movements from audio, enabling realistic talking face animations with head movements and identity preservation, even for unseen individuals and arbitrary audio lengths.
Contribution
It presents the first audio-keypoint dataset and a model that generates synchronized facial keypoints from audio, accommodating head movements and unseen identities.
Findings
The dataset contains over 150,000 videos at 224p and 25fps.
The Audio2Keypoint model effectively generates plausible facial keypoint sequences.
The approach generalizes to unseen people and arbitrary audio lengths.
Abstract
Whenever we speak, our voice is accompanied by facial movements and expressions. Several recent works have shown the synthesis of highly photo-realistic videos of talking faces, but they either require a source video to drive the target face or only generate videos with a fixed head pose. This lack of facial movement is because most of these works focus on the lip movement in sync with the audio while assuming the remaining facial keypoints' fixed nature. To address this, a unique audio-keypoint dataset of over 150,000 videos at 224p and 25fps is introduced that relates the facial keypoint movement for the given audio. This dataset is then further used to train the model, Audio2Keypoint, a novel approach for synthesizing facial keypoint movement to go with the audio. Given a single image of the target person and an audio sequence (in any language), Audio2Keypoint generates a plausible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
