Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks
Seyed Ali Jalalifar, Hosein Hasani, Hamid Aghajan

TL;DR
This paper introduces a new method combining recurrent neural networks and conditional GANs to generate realistic, lip-synced facial videos from audio input, advancing the realism and synchronization in facial reenactment.
Contribution
The novel integration of RNNs and conditional GANs for synchronized facial reenactment from audio is a key innovation of this work.
Findings
Achieved accurate lip sync with realistic facial images
Generated natural face sequences synchronized with input audio
Demonstrated high-quality facial reenactment results
Abstract
We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing
