Do You Guys Want to Dance: Zero-Shot Compositional Human Dance   Generation with Multiple Persons

Zhe Xu; Kun Wei; Xu Yang; Cheng Deng

arXiv:2401.13363·cs.CV·January 25, 2024·1 cites

Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons

Zhe Xu, Kun Wei, Xu Yang, Cheng Deng

PDF

Open Access

TL;DR

This paper introduces a new task, dataset, and evaluation protocol for compositional human dance generation involving multiple persons and backgrounds, revealing current models' limitations and proposing a zero-shot framework to improve realism and consistency.

Contribution

The paper presents MultiDance-Zero, a novel zero-shot framework for multi-person dance video synthesis, including pose-aware inversion, compositional augmentation, and consistency-guided sampling.

Findings

01

Existing methods fail to generalize to real-world multi-person scenarios.

02

MultiDance-Zero significantly improves realism and temporal consistency.

03

The approach outperforms state-of-the-art methods on the new cHDG benchmark.

Abstract

Human dance generation (HDG) aims to synthesize realistic videos from images and sequences of driving poses. Despite great success, existing methods are limited to generating videos of a single person with specific backgrounds, while the generalizability for real-world scenarios with multiple persons and complex backgrounds remains unclear. To systematically measure the generalizability of HDG models, we introduce a new task, dataset, and evaluation protocol of compositional human dance generation (cHDG). Evaluating the state-of-the-art methods on cHDG, we empirically find that they fail to generalize to real-world scenarios. To tackle the issue, we propose a novel zero-shot framework, dubbed MultiDance-Zero, that can synthesize videos consistent with arbitrary multiple persons and background while precisely following the driving poses. Specifically, in contrast to straightforward DDIM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis

MethodsSparse Evolutionary Training