GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression
Ziqi Zhou, Weize Quan, Hailin Shi, Wei Li, Lili Wang, Dong-Ming Yan

TL;DR
GoHD is a novel framework for generating highly realistic, expressive, and controllable portrait videos from any identity and motion, integrating gaze, prosody-aware head poses, and lip synchronization with limited data.
Contribution
It introduces a disentangled animation module with gaze control, a prosody-aware diffusion model for head poses, and a two-stage training strategy for lip synchronization, advancing portrait animation technology.
Findings
Demonstrates superior generalization to unseen subjects.
Produces realistic, expressive talking head videos.
Effectively decouples lip motion from other facial motions.
Abstract
Audio-driven talking head generation necessitates seamless integration of audio and visual data amidst the challenges posed by diverse input portraits and intricate correlations between audio and facial motions. In response, we propose a robust framework GoHD designed to produce highly realistic, expressive, and controllable portrait videos from any reference identity with any motion. GoHD innovates with three key modules: Firstly, an animation module utilizing latent navigation is introduced to improve the generalization ability across unseen input styles. This module achieves high disentanglement of motion and identity, and it also incorporates gaze orientation to rectify unnatural eye movements that were previously overlooked. Secondly, a conformer-structured conditional diffusion model is designed to guarantee head poses that are aware of prosody. Thirdly, to estimate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAugmented Reality Applications · Face recognition and analysis · Gaze Tracking and Assistive Technology
MethodsAttentive Walk-Aggregating Graph Neural Network · Diffusion
