ObamaNet: Photo-realistic lip-sync from text

Rithesh Kumar; Jose Sotelo; Kundan Kumar; Alexandre de Brebisson,; Yoshua Bengio

arXiv:1801.01442·cs.CV·January 8, 2018·85 cites

ObamaNet: Photo-realistic lip-sync from text

Rithesh Kumar, Jose Sotelo, Kundan Kumar, Alexandre de Brebisson,, Yoshua Bengio

PDF

Open Access 1 Repo

TL;DR

ObamaNet is a fully trainable neural architecture that converts any text into synchronized audio and photo-realistic lip-sync videos without traditional graphics methods.

Contribution

It introduces a novel neural pipeline combining text-to-speech, keypoint generation, and video synthesis for realistic lip-sync video creation from text.

Findings

01

First fully trainable neural lip-sync system

02

Generates synchronized audio and video from text

03

Does not rely on traditional graphics techniques

Abstract

We present ObamaNet, the first architecture that generates both audio and synchronized photo-realistic lip-sync videos from any new text. Contrary to other published lip-sync approaches, ours is only composed of fully trainable neural modules and does not rely on any traditional computer graphics methods. More precisely, we use three main modules: a text-to-speech network based on Char2Wav, a time-delayed LSTM to generate mouth-keypoints synced to the audio, and a network based on Pix2Pix to generate the video frames conditioned on the keypoints.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ung200/thats-what-obama-said
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Face recognition and analysis

MethodsConcatenated Skip Connection · PatchGAN · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Dropout · Pix2Pix · Sigmoid Activation · Tanh Activation