Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling

Xiaojie Li; Ronghui Li; Shukai Fang; Shuzhao Xie; Xiaoyang Guo; Jiaqing Zhou; Junkun Peng; Zhi Wang

arXiv:2507.14915·cs.MM·July 30, 2025

Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling

Xiaojie Li, Ronghui Li, Shukai Fang, Shuzhao Xie, Xiaoyang Guo, Jiaqing Zhou, Junkun Peng, Zhi Wang

PDF

TL;DR

This paper introduces SoulDance, a new high-quality dataset and SoulNet, a hierarchical model that generates synchronized, holistic 3D dance sequences aligned with music, addressing previous challenges in data scarcity and complex motion modeling.

Contribution

The paper presents a novel dataset and a hierarchical generative framework that models detailed motion dependencies and ensures music-dance alignment for holistic 3D dance synthesis.

Findings

01

SoulNet outperforms existing methods in dance quality and alignment.

02

The hierarchical motion modeling captures complex interdependent movements.

03

Cross-modal retrieval enhances temporal synchronization and semantic coherence.

Abstract

Well-coordinated, music-aligned holistic dance enhances emotional expressiveness and audience engagement. However, generating such dances remains challenging due to the scarcity of holistic 3D dance datasets, the difficulty of achieving cross-modal alignment between music and dance, and the complexity of modeling interdependent motion across the body, hands, and face. To address these challenges, we introduce SoulDance, a high-precision music-dance paired dataset captured via professional motion capture systems, featuring meticulously annotated holistic dance movements. Building on this dataset, we propose SoulNet, a framework designed to generate music-aligned, kinematically coordinated holistic dance sequences. SoulNet consists of three principal components: (1) Hierarchical Residual Vector Quantization, which models complex, fine-grained motion dependencies across the body, hands,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.