Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model
Bosheng Qin, Wentao Ye, Qifan Yu, Siliang Tang, Yueting Zhuang

TL;DR
Dancing Avatar introduces a novel method for generating high-quality, pose and text-guided human motion videos using a pretrained T2I diffusion model, with modules ensuring character consistency, background continuity, and temporal coherence.
Contribution
The paper presents a new framework that leverages a pretrained T2I diffusion model with alignment modules for consistent, high-quality human motion video synthesis guided by text and poses.
Findings
Superior video quality compared to state-of-the-art methods
Enhanced temporal coherence and background fidelity
Effective preservation of human character and clothing across poses
Abstract
The rising demand for creating lifelike avatars in the digital realm has led to an increased need for generating high-quality human videos guided by textual descriptions and poses. We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues. Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion. The crux of innovation lies in our adept utilization of the T2I diffusion model for producing video frames successively while preserving contextual relevance. We surmount the hurdles posed by maintaining human character and clothing consistency across varying poses, along with upholding the background's continuity amidst diverse human movements. To ensure consistent human appearances across the entire video, we devise an intra-frame alignment module. This module assimilates text-guided synthesized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Cinema and Media Studies
MethodsInpainting · Diffusion
