High-Fidelity and Freely Controllable Talking Head Video Generation

Yue Gao; Yuan Zhou; Jinglu Wang; Xiao Li; Xiang Ming; Yan Lu

arXiv:2304.10168·cs.CV·November 3, 2023·1 cites

High-Fidelity and Freely Controllable Talking Head Video Generation

Yue Gao, Yuan Zhou, Jinglu Wang, Xiao Li, Xiang Ming, Yan Lu

PDF

Open Access

TL;DR

This paper introduces a high-fidelity, controllable talking head video generation method that addresses distortions, disentangles motion attributes, and reduces flickering artifacts, achieving state-of-the-art results.

Contribution

The proposed model combines self-supervised and 3D face landmarks, a motion-aware multi-scale feature alignment, and a feature context adaptation to improve quality and controllability.

Findings

01

Produces high-fidelity videos with explicit control over head pose and expressions.

02

Reduces distortions and flickering artifacts in generated videos.

03

Achieves state-of-the-art performance on challenging datasets.

Abstract

Talking head generation is to generate video based on a given source identity and target motion. However, current methods face several challenges that limit the quality and controllability of the generated videos. First, the generated face often has unexpected deformation and severe distortions. Second, the driving image does not explicitly disentangle movement-relevant information, such as poses and expressions, which restricts the manipulation of different attributes during generation. Third, the generated videos tend to have flickering artifacts due to the inconsistency of the extracted landmarks between adjacent frames. In this paper, we propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression. Our method leverages both self-supervised learned landmarks and 3D face model-based landmarks to model the motion. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Vision and Imaging