OPT: One-shot Pose-Controllable Talking Head Generation
Jin Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong, Han

TL;DR
This paper introduces OPT, a novel one-shot talking head generation method that achieves high-quality, pose-controllable lip-sync videos while preserving source identity, overcoming previous identity mismatch issues.
Contribution
OPT is the first method to effectively disentangle audio content from speaker identity and incorporate pose control, enabling realistic, identity-preserving talking head synthesis.
Findings
Outperforms previous state-of-the-art methods in quality and pose control.
Successfully preserves source identity during pose variations.
Generates natural and lip-synced talking heads with high fidelity.
Abstract
One-shot talking head generation produces lip-sync talking heads based on arbitrary audio and one source face. To guarantee the naturalness and realness, recent methods propose to achieve free pose control instead of simply editing mouth areas. However, existing methods do not preserve accurate identity of source face when generating head motions. To solve the identity mismatch problem and achieve high-quality free pose control, we present One-shot Pose-controllable Talking head generation network (OPT). Specifically, the Audio Feature Disentanglement Module separates content features from audios, eliminating the influence of speaker-specific information contained in arbitrary driving audios. Later, the mouth expression feature is extracted from the content feature and source face, during which the landmark loss is designed to enhance the accuracy of facial structure and identity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
MethodsOPT
