Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with   Image Diffusion Model

Bosheng Qin; Wentao Ye; Qifan Yu; Siliang Tang; Yueting Zhuang

arXiv:2308.07749·cs.CV·August 16, 2023·5 cites

Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model

Bosheng Qin, Wentao Ye, Qifan Yu, Siliang Tang, Yueting Zhuang

PDF

Open Access

TL;DR

Dancing Avatar introduces a novel method for generating high-quality, pose and text-guided human motion videos using a pretrained T2I diffusion model, with modules ensuring character consistency, background continuity, and temporal coherence.

Contribution

The paper presents a new framework that leverages a pretrained T2I diffusion model with alignment modules for consistent, high-quality human motion video synthesis guided by text and poses.

Findings

01

Superior video quality compared to state-of-the-art methods

02

Enhanced temporal coherence and background fidelity

03

Effective preservation of human character and clothing across poses

Abstract

The rising demand for creating lifelike avatars in the digital realm has led to an increased need for generating high-quality human videos guided by textual descriptions and poses. We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues. Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion. The crux of innovation lies in our adept utilization of the T2I diffusion model for producing video frames successively while preserving contextual relevance. We surmount the hurdles posed by maintaining human character and clothing consistency across varying poses, along with upholding the background's continuity amidst diverse human movements. To ensure consistent human appearances across the entire video, we devise an intra-frame alignment module. This module assimilates text-guided synthesized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Cinema and Media Studies

MethodsInpainting · Diffusion