Talking Head from Speech Audio using a Pre-trained Image Generator
Mohammed M. Alghamdi, He Wang, Andrew J. Bulpitt, David C. Hogg

TL;DR
This paper introduces a new approach for generating realistic talking-head videos from speech audio and a single image by leveraging a pre-trained StyleGAN and modeling latent space trajectories.
Contribution
It combines a two-stage training process with a pre-trained StyleGAN to produce high-quality talking-head videos from speech and a single identity image.
Findings
Outperforms recent state-of-the-art methods on standard datasets
Achieves high visual quality as measured by PSNR, SSIM, FID, and LMD
Validated components through ablation experiments
Abstract
We propose a novel method for generating high-resolution videos of talking-heads from speech audio and a single 'identity' image. Our method is based on a convolutional neural network model that incorporates a pre-trained StyleGAN generator. We model each frame as a point in the latent space of StyleGAN so that a video corresponds to a trajectory through the latent space. Training the network is in two stages. The first stage is to model trajectories in the latent space conditioned on speech utterances. To do this, we use an existing encoder to invert the generator, mapping from each video frame into the latent space. We train a recurrent neural network to map from speech utterances to displacements in the latent space of the image generator. These displacements are relative to the back-projection into the latent space of an identity image chosen from the individuals depicted in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStyleGAN · Dense Connections · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · R1 Regularization · Feedforward Network · Adaptive Instance Normalization
