Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise   Loss

Lele Chen; Ross K. Maddox; Zhiyao Duan; Chenliang Xu

arXiv:1905.03820·cs.CV·May 13, 2019·27 cites

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a cascade GAN framework for talking face video generation that improves robustness and visual quality by transferring audio to facial landmarks before video synthesis, and employs novel loss and discriminator designs.

Contribution

It proposes a hierarchical approach with a dynamic pixel-wise loss and a sequence-aware discriminator to enhance synchronization and image sharpness in talking face videos.

Findings

01

Outperforms state-of-the-art methods in quantitative metrics

02

Produces more realistic and synchronized talking face videos

03

Demonstrates robustness across various face shapes and audio conditions

Abstract

We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lelechen63/ATVGnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing

MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729