A Unit Enhancement and Guidance Framework for Audio-Driven Avatar Video Generation
S.Z. Zhou, Y.B. Wang, J.F. Wu, T. Hu, J.N. Zhang

TL;DR
This paper introduces PAHA, a novel framework for audio-driven avatar video generation that enhances regional guidance and consistency, significantly improving quality and efficiency over existing multi-stage methods.
Contribution
The paper proposes a new unit enhancement and guidance framework with two key methods, PAR and PCE, to improve visual quality and motion consistency in audio-driven avatar videos.
Findings
PAHA outperforms existing methods in audio-motion alignment.
The proposed classifiers improve regional consistency.
Experimental results validate the effectiveness of the framework.
Abstract
Audio-driven human animation technology is widely used in human-computer interaction, and the emergence of diffusion models has further advanced its development. Currently, most methods rely on multi-stage generation and intermediate representations, resulting in long inference time and issues with generation quality in specific foreground regions and audio-motion consistency. These shortcomings are primarily due to the lack of localized fine-grained supervised guidance. To address above challenges, we propose Parts-aware Audio-driven Human Animation, PAHA, a unit enhancement and guidance framework for audio-driven upper-body animation. We introduce two key methods: Parts-Aware Re-weighting (PAR) and Parts Consistency Enhancement (PCE). PAR dynamically adjusts regional training loss weights based on pose confidence scores, effectively improving visual quality. PCE constructs and trains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
