Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image   Animation

Mingwang Xu; Hui Li; Qingkun Su; Hanlin Shang; Liwei Zhang; Ce Liu,; Jingdong Wang; Yao Yao; Siyu Zhu

arXiv:2406.08801·cs.CV·June 18, 2024·5 cites

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Mingwang Xu, Hui Li, Qingkun Su, Hanlin Shang, Liwei Zhang, Ce Liu,, Jingdong Wang, Yao Yao, Siyu Zhu

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a hierarchical audio-driven visual synthesis method using diffusion models for realistic, synchronized, and personalized portrait animation, improving quality and motion diversity.

Contribution

It presents an end-to-end diffusion-based framework with hierarchical control for enhanced synchronization and personalization in portrait animation.

Findings

01

Improved lip synchronization accuracy

02

Enhanced image and video quality

03

Greater motion diversity and personalization

Abstract

The field of portrait image animation, driven by speech audio input, has experienced significant advancements in the generation of realistic and dynamic portraits. This research delves into the complexities of synchronizing facial movements and creating visually appealing, temporally consistent animations within the framework of diffusion-based methodologies. Moving away from traditional paradigms that rely on parametric models for intermediate facial representations, our innovative approach embraces the end-to-end diffusion paradigm and introduces a hierarchical audio-driven visual synthesis module to enhance the precision of alignment between audio inputs and visual outputs, encompassing lip, expression, and pose motion. Our proposed network architecture seamlessly integrates diffusion-based generative models, a UNet-based denoiser, temporal alignment techniques, and a reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fudan-generative-vision/hallo
pytorch

Models

🤗
fudan-generative-ai/hallo
model· ♡ 97
♡ 97

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion