End-to-End Speech-Driven Facial Animation with Temporal GANs

Konstantinos Vougioukas; Stavros Petridis; Maja Pantic

arXiv:1805.09313·eess.AS·July 20, 2018·41 cites

End-to-End Speech-Driven Facial Animation with Temporal GANs

Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel end-to-end system using temporal GANs to generate realistic, subject-independent talking head videos directly from raw audio and a still image, capturing lip sync and natural expressions.

Contribution

First method to produce subject-independent, realistic talking head videos directly from raw audio without handcrafted features, using a temporal GAN with dual discriminators.

Findings

01

Generated videos show accurate lip synchronization.

02

Temporal GANs produce more natural facial movements.

03

System outperforms static GAN approaches in user studies.

Abstract

Speech-driven facial animation is the process which uses speech signals to automatically synthesize a talking character. The majority of work in this domain creates a mapping from audio features to visual features. This often requires post-processing using computer graphics techniques to produce realistic albeit subject dependent results. We present a system for generating videos of a talking head, using a still image of a person and an audio clip containing speech, that doesn't rely on any handcrafted intermediate features. To the best of our knowledge, this is the first method capable of generating subject independent realistic videos directly from raw audio. Our method can generate videos which have (a) lip movements that are in sync with the audio and (b) natural facial expressions such as blinks and eyebrow movements. We achieve this by using a temporal GAN with 2 discriminators,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PrashanthaTP/wav2mov
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis

MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729