Speech-Driven Facial Reenactment Using Conditional Generative   Adversarial Networks

Seyed Ali Jalalifar; Hosein Hasani; Hamid Aghajan

arXiv:1803.07461·cs.CV·March 21, 2018·24 cites

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

Seyed Ali Jalalifar, Hosein Hasani, Hamid Aghajan

PDF

Open Access

TL;DR

This paper introduces a new method combining recurrent neural networks and conditional GANs to generate realistic, lip-synced facial videos from audio input, advancing the realism and synchronization in facial reenactment.

Contribution

The novel integration of RNNs and conditional GANs for synchronized facial reenactment from audio is a key innovation of this work.

Findings

01

Achieved accurate lip sync with realistic facial images

02

Generated natural face sequences synchronized with input audio

03

Demonstrated high-quality facial reenactment results

Abstract

We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing