Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Jiahao Cui, Hui Li, Yun Zhan, Hanlin Shang, Kaihui Cheng, Yuqi Ma,, Shan Mu, Hang Zhou, Jingdong Wang, Siyu Zhu

TL;DR
Hallo3 introduces a transformer-based video generative model that produces highly realistic and dynamic portrait animations, effectively handling diverse perspectives, backgrounds, and speech-driven motion, surpassing prior U-Net-based methods.
Contribution
The paper presents the first application of a pretrained transformer-based video model for portrait animation, with a novel identity reference network and speech conditioning mechanisms.
Findings
Outperforms prior methods in realism and diversity of portrait videos
Successfully handles non-frontal perspectives and dynamic backgrounds
Demonstrates strong generalization on benchmark and wild datasets
Abstract
Existing methodologies for animating portrait images face significant challenges, particularly in handling non-frontal perspectives, rendering dynamic objects around the portrait, and generating immersive, realistic backgrounds. In this paper, we introduce the first application of a pretrained transformer-based video generative model that demonstrates strong generalization capabilities and generates highly dynamic, realistic videos for portrait animation, effectively addressing these challenges. The adoption of a new video backbone model makes previous U-Net-based methods for identity maintenance, audio conditioning, and video extrapolation inapplicable. To address this limitation, we design an identity reference network consisting of a causal 3D VAE combined with a stacked series of transformer layers, ensuring consistent facial identity across video sequences. Additionally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques · Computer Graphics and Visualization Techniques
