GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with   Rhythmic Poses and Realistic Expression

Ziqi Zhou; Weize Quan; Hailin Shi; Wei Li; Lili Wang; Dong-Ming Yan

arXiv:2412.09296·cs.CV·December 16, 2024

GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression

Ziqi Zhou, Weize Quan, Hailin Shi, Wei Li, Lili Wang, Dong-Ming Yan

PDF

Open Access 1 Repo

TL;DR

GoHD is a novel framework for generating highly realistic, expressive, and controllable portrait videos from any identity and motion, integrating gaze, prosody-aware head poses, and lip synchronization with limited data.

Contribution

It introduces a disentangled animation module with gaze control, a prosody-aware diffusion model for head poses, and a two-stage training strategy for lip synchronization, advancing portrait animation technology.

Findings

01

Demonstrates superior generalization to unseen subjects.

02

Produces realistic, expressive talking head videos.

03

Effectively decouples lip motion from other facial motions.

Abstract

Audio-driven talking head generation necessitates seamless integration of audio and visual data amidst the challenges posed by diverse input portraits and intricate correlations between audio and facial motions. In response, we propose a robust framework GoHD designed to produce highly realistic, expressive, and controllable portrait videos from any reference identity with any motion. GoHD innovates with three key modules: Firstly, an animation module utilizing latent navigation is introduced to improve the generalization ability across unseen input styles. This module achieves high disentanglement of motion and identity, and it also incorporates gaze orientation to rectify unnatural eye movements that were previously overlooked. Secondly, a conformer-structured conditional diffusion model is designed to guarantee head poses that are aware of prosody. Thirdly, to estimate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jia1018/GoHD
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAugmented Reality Applications · Face recognition and analysis · Gaze Tracking and Assistive Technology

MethodsAttentive Walk-Aggregating Graph Neural Network · Diffusion