Talking Face Generation by Adversarially Disentangled Audio-Visual   Representation

Hang Zhou; Yu Liu; Ziwei Liu; Ping Luo; Xiaogang Wang

arXiv:1807.07860·cs.CV·April 24, 2019·37 cites

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for arbitrary-subject talking face generation by learning a disentangled audio-visual representation, enabling realistic synthesis and improved lip reading and retrieval tasks.

Contribution

It proposes a new associative-and-adversarial training framework to explicitly disentangle subject and speech information in audio-visual data.

Findings

01

Generates realistic talking face sequences for arbitrary subjects.

02

Produces clearer lip motion patterns than previous methods.

03

Enhances performance in lip reading and audio-video retrieval tasks.

Abstract

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hangz-nju-cuhk/Talking-Face-Generation-DAVS
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Speech and Audio Processing