Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
Fa-Ting Hong, Dan Xu

TL;DR
This paper introduces MCNet, a novel memory-augmented network that learns a global facial representation space to improve the fidelity and identity preservation in talking head video generation, especially during complex motions.
Contribution
The paper proposes a new implicit identity representation conditioned memory network with a unified facial meta-memory bank and a query mechanism for enhanced talking head synthesis.
Findings
Outperforms previous state-of-the-art methods on VoxCeleb1 and CelebV datasets.
Effectively compensates for occluded regions and expression variations.
Learns a comprehensive facial memory for high-fidelity video generation.
Abstract
Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information derived from a target-driving video, while maintaining the person's identity in the source image. However, dramatic and complex motions in the driving video cause ambiguous generation, because the still source image cannot provide sufficient appearance information for occluded regions or delicate expression variations, which produces severe artifacts and significantly degrades the generation quality. To tackle this problem, we propose to learn a global facial representation space, and design a novel implicit identity representation conditioned memory compensation network, coined as MCNet, for high-fidelity talking head generation.~Specifically, we devise a network module to learn a unified spatial facial meta-memory bank from all training samples, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis
