Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image   Animation

Jiahao Cui; Hui Li; Yao Yao; Hao Zhu; Hanlin Shang; Kaihui Cheng; Hang; Zhou; Siyu Zhu; Jingdong Wang

arXiv:2410.07718·cs.CV·October 15, 2024·3 cites

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

Jiahao Cui, Hui Li, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang, Zhou, Siyu Zhu, Jingdong Wang

PDF

Open Access 1 Repo 1 Models

TL;DR

Hallo2 advances portrait image animation by enabling long-duration, high-resolution (4K) video synthesis driven by audio and textual controls, overcoming previous limitations in temporal coherence and resolution.

Contribution

The paper introduces Hallo2, the first method to produce 4K, hour-long, audio-driven portrait animations with semantic textual control, extending capabilities of prior models like Hallo.

Findings

01

Achieves 4K resolution in long-duration videos

02

Maintains visual consistency over extended durations

03

Outperforms existing methods in quality and controllability

Abstract

Recent advances in latent diffusion-based generative models for portrait image animation, such as Hallo, have achieved impressive results in short-duration video synthesis. In this paper, we present updates to Hallo, introducing several design enhancements to extend its capabilities. First, we extend the method to produce long-duration videos. To address substantial challenges such as appearance drift and temporal artifacts, we investigate augmentation strategies within the image space of conditional motion frames. Specifically, we introduce a patch-drop technique augmented with Gaussian noise to enhance visual consistency and temporal coherence over long duration. Second, we achieve 4K resolution portrait video generation. To accomplish this, we implement vector quantization of latent codes and apply temporal alignment techniques to maintain coherence across the temporal dimension. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fudan-generative-vision/hallo2
pytorchOfficial

Models

🤗
fudan-generative-ai/hallo2
model· ♡ 133
♡ 133

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis