HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

Yi Chen; Sen Liang; Zixiang Zhou; Ziyao Huang; Yifeng Ma; Junshu Tang; Qin Lin; Yuan Zhou; Qinglin Lu

arXiv:2505.20156·cs.CV·June 4, 2025

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

Yi Chen, Sen Liang, Zixiang Zhou, Ziyao Huang, Yifeng Ma, Junshu Tang, Qin Lin, Yuan Zhou, Qinglin Lu

PDF

Open Access 1 Repo 2 Models

TL;DR

HunyuanVideo-Avatar is a multimodal diffusion transformer model that generates high-fidelity, emotion-controllable, multi-character dialogue videos with improved consistency and realism, addressing key challenges in audio-driven human animation.

Contribution

It introduces three innovations: a character image injection module, an emotion transfer module, and a face-aware audio adapter, enabling dynamic, emotion-aligned, multi-character video generation.

Findings

01

Surpasses state-of-the-art on benchmark datasets

02

Generates realistic, emotion-aligned multi-character videos

03

Effective in dynamic and immersive scenarios

Abstract

Recent years have witnessed significant progress in audio-driven human animation. However, critical challenges remain in (i) generating highly dynamic videos while preserving character consistency, (ii) achieving precise emotion alignment between characters and audio, and (iii) enabling multi-character audio-driven animation. To address these challenges, we propose HunyuanVideo-Avatar, a multimodal diffusion transformer (MM-DiT)-based model capable of simultaneously generating dynamic, emotion-controllable, and multi-character dialogue videos. Concretely, HunyuanVideo-Avatar introduces three key innovations: (i) A character image injection module is designed to replace the conventional addition-based character conditioning scheme, eliminating the inherent condition mismatch between training and inference. This ensures the dynamic motion and strong character consistency; (ii) An Audio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tencent-hunyuan/hunyuanvideo-avatar
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation

MethodsDiffusion · Adapter