OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation

Qijun Gan; Ruizi Yang; Jianke Zhu; Shaofei Xue; Steven Hoi

arXiv:2506.18866·cs.CV·June 24, 2025

OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation

Qijun Gan, Ruizi Yang, Jianke Zhu, Shaofei Xue, Steven Hoi

PDF

3 Models

TL;DR

OmniAvatar is a novel full-body video generation model that uses adaptive audio embedding and prompt control to produce natural, synchronized human animations across diverse scenarios.

Contribution

It introduces a pixel-wise multi-hierarchical audio embedding strategy and a LoRA-based training approach for improved full-body animation with precise control.

Findings

01

Outperforms existing models in facial and semi-body video generation

02

Achieves high lip-sync accuracy and natural movements

03

Enables versatile domain-specific video creation

Abstract

Significant progress has been made in audio-driven human animation, while most existing methods focus mainly on facial movements, limiting their ability to create full-body animations with natural synchronization and fluidity. They also struggle with precise prompt control for fine-grained generation. To tackle these challenges, we introduce OmniAvatar, an innovative audio-driven full-body video generation model that enhances human animation with improved lip-sync accuracy and natural movements. OmniAvatar introduces a pixel-wise multi-hierarchical audio embedding strategy to better capture audio features in the latent space, enhancing lip-syncing across diverse scenes. To preserve the capability for prompt-driven control of foundation models while effectively incorporating audio features, we employ a LoRA-based training approach. Extensive experiments show that OmniAvatar surpasses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus