High-Fidelity and Freely Controllable Talking Head Video Generation
Yue Gao, Yuan Zhou, Jinglu Wang, Xiao Li, Xiang Ming, Yan Lu

TL;DR
This paper introduces a high-fidelity, controllable talking head video generation method that addresses distortions, disentangles motion attributes, and reduces flickering artifacts, achieving state-of-the-art results.
Contribution
The proposed model combines self-supervised and 3D face landmarks, a motion-aware multi-scale feature alignment, and a feature context adaptation to improve quality and controllability.
Findings
Produces high-fidelity videos with explicit control over head pose and expressions.
Reduces distortions and flickering artifacts in generated videos.
Achieves state-of-the-art performance on challenging datasets.
Abstract
Talking head generation is to generate video based on a given source identity and target motion. However, current methods face several challenges that limit the quality and controllability of the generated videos. First, the generated face often has unexpected deformation and severe distortions. Second, the driving image does not explicitly disentangle movement-relevant information, such as poses and expressions, which restricts the manipulation of different attributes during generation. Third, the generated videos tend to have flickering artifacts due to the inconsistency of the extracted landmarks between adjacent frames. In this paper, we propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression. Our method leverages both self-supervised learned landmarks and 3D face model-based landmarks to model the motion. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Vision and Imaging
