Pose-Controllable Talking Face Generation by Implicitly Modularized   Audio-Visual Representation

Hang Zhou; Yasheng Sun; Wayne Wu; Chen Change Loy; Xiaogang Wang,; Ziwei Liu

arXiv:2104.11116·cs.CV·April 23, 2021·24 cites

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang,, Ziwei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel framework for generating pose-controllable talking faces from a single image, effectively handling pose control, lip synchronization, and extreme view robustness without relying on structural estimations.

Contribution

The proposed method models audio-visual representations with an implicit pose code, enabling accurate pose control and lip synchronization directly from raw images, surpassing previous landmark-based approaches.

Findings

01

Accurately lip-synced talking faces with controllable poses.

02

Robustness to extreme viewing angles.

03

Effective frontalization of talking faces.

Abstract

While accurate lip synchronization has been achieved for arbitrary-subject audio-driven talking face generation, the problem of how to efficiently drive the head pose remains. Previous methods rely on pre-estimated structural information such as landmarks and 3D parameters, aiming to generate personalized rhythmic movements. However, the inaccuracy of such estimated information under extreme conditions would lead to degradation problems. In this paper, we propose a clean yet effective framework to generate pose-controllable talking faces. We operate on raw face images, using only a single photo as an identity reference. The key is to modularize audio-visual representations by devising an implicit low-dimension pose code. Substantially, both speech content and head pose information lie in a joint non-identity embedding space. While speech content information can be defined by learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hangz-nju-cuhk/Talking-Face_PC-AVS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing