Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons
Zhe Xu, Kun Wei, Xu Yang, Cheng Deng

TL;DR
This paper introduces a new task, dataset, and evaluation protocol for compositional human dance generation involving multiple persons and backgrounds, revealing current models' limitations and proposing a zero-shot framework to improve realism and consistency.
Contribution
The paper presents MultiDance-Zero, a novel zero-shot framework for multi-person dance video synthesis, including pose-aware inversion, compositional augmentation, and consistency-guided sampling.
Findings
Existing methods fail to generalize to real-world multi-person scenarios.
MultiDance-Zero significantly improves realism and temporal consistency.
The approach outperforms state-of-the-art methods on the new cHDG benchmark.
Abstract
Human dance generation (HDG) aims to synthesize realistic videos from images and sequences of driving poses. Despite great success, existing methods are limited to generating videos of a single person with specific backgrounds, while the generalizability for real-world scenarios with multiple persons and complex backgrounds remains unclear. To systematically measure the generalizability of HDG models, we introduce a new task, dataset, and evaluation protocol of compositional human dance generation (cHDG). Evaluating the state-of-the-art methods on cHDG, we empirically find that they fail to generalize to real-world scenarios. To tackle the issue, we propose a novel zero-shot framework, dubbed MultiDance-Zero, that can synthesize videos consistent with arbitrary multiple persons and background while precisely following the driving poses. Specifically, in contrast to straightforward DDIM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
MethodsSparse Evolutionary Training
