Reframing Music-Driven 2D Dance Pose Generation as Multi-Channel Image Generation
Yan Zhang, Han Zou, Lincong Feng, Cong Xie, Ruiqi Yu, Zhenpeng Zhan

TL;DR
This paper introduces a novel approach to generate 2D dance poses from music by framing it as a multi-channel image synthesis problem, leveraging advances in image generation models for improved temporal coherence and subject consistency.
Contribution
It reformulates music-to-dance pose generation as a multi-channel image synthesis task, incorporating temporal indexing and reference conditioning for better performance.
Findings
Outperforms existing methods in pose and video metrics
Achieves higher human preference scores
Demonstrates effective long-horizon segment stitching
Abstract
Recent pose-to-video models can translate 2D pose sequences into photorealistic, identity-preserving dance videos, so the key challenge is to generate temporally coherent, rhythm-aligned 2D poses from music, especially under complex, high-variance in-the-wild distributions. We address this by reframing music-to-dance generation as a music-token-conditioned multi-channel image synthesis problem: 2D pose sequences are encoded as one-hot images, compressed by a pretrained image VAE, and modeled with a DiT-style backbone, allowing us to inherit architectural and training advances from modern text-to-image models and better capture high-variance 2D pose distributions. On top of this formulation, we introduce (i) a time-shared temporal indexing scheme that explicitly synchronizes music tokens and pose latents over time and (ii) a reference-pose conditioning strategy that preserves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · 3D Shape Modeling and Analysis
