DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

Qiaosong Qi; Le Zhuo; Aixi Zhang; Yue Liao; Fei Fang; Si Liu,; Shuicheng Yan

arXiv:2308.02915·cs.GR·August 8, 2023

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu,, Shuicheng Yan

PDF

TL;DR

DiffDance is a novel cascaded diffusion model that generates realistic, long-form dance sequences aligned with music, overcoming limitations of autoregressive methods by using a two-stage diffusion approach and advanced training techniques.

Contribution

The paper introduces a cascaded diffusion framework for dance generation, combining music-to-dance and super-resolution models with contrastive and geometric losses for improved realism and alignment.

Findings

01

Produces high-quality, long-form dance sequences

02

Achieves results comparable to state-of-the-art autoregressive methods

03

Demonstrates effective music-motion alignment on AIST++ dataset

Abstract

When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance employs a pretrained audio representation learning model to extract music embeddings and further align its embedding space to motion via contrastive loss. During training our cascaded diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion · ALIGN