Everybody's Talkin': Let Me Talk as You Want

Linsen Song; Wayne Wu; Chen Qian; Ran He; Chen Change Loy

arXiv:2001.05201·cs.CV·January 16, 2020·35 cites

Everybody's Talkin': Let Me Talk as You Want

Linsen Song, Wayne Wu, Chen Qian, Ran He, Chen Change Loy

PDF

Open Access

TL;DR

This paper introduces a dynamic, end-to-end method for editing portrait videos by translating source audio into realistic facial movements, preserving original geometry and pose, and ensuring temporal coherence.

Contribution

It presents a novel approach that factorizes video into expression, geometry, and pose, using a recurrent network to map audio to expressions without person-specific training.

Findings

01

Achieves high realism in talking portrait videos

02

Maintains original video context and pose

03

Robust to variations in source audio

Abstract

We present a method to edit a target portrait footage by taking a sequence of audio as input to synthesize a photo-realistic video. This method is unique because it is highly dynamic. It does not assume a person-specific rendering network yet capable of translating arbitrary source audio into arbitrary video output. Instead of learning a highly heterogeneous and nonlinear mapping from audio to the video directly, we first factorize each target video frame into orthogonal parameter spaces, i.e., expression, geometry, and pose, via monocular 3D face reconstruction. Next, a recurrent network is introduced to translate source audio into expression parameters that are primarily related to the audio content. The audio-translated expression parameters are then used to synthesize a photo-realistic human subject in each video frame, with the movement of the mouth regions precisely mapped to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Face recognition and analysis