Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation
Guangtao Lyu, Chenghao Xu, Qi Liu, Jiexi Yan, Muli Yang, Fen Fang, Cheng Deng

TL;DR
This paper introduces TempoMoE, a hierarchical mixture-of-experts model that leverages the stable property of tempo to improve rhythm-aligned 3D dance generation from music, outperforming existing methods without relying on noisy genre labels.
Contribution
The paper proposes TempoMoE, a novel tempo-aware hierarchical mixture-of-experts module that enhances dance generation by capturing rhythmic dynamics without manual genre labels.
Findings
Achieves state-of-the-art dance quality and rhythm alignment.
Effectively models rhythmic dynamics across diverse music genres.
Operates without relying on coarse or noisy genre labels.
Abstract
Music to 3D dance generation aims to synthesize realistic and rhythmically synchronized human dance from music. While existing methods often rely on additional genre labels to further improve dance generation, such labels are typically noisy, coarse, unavailable, or insufficient to capture the diversity of real-world music, which can result in rhythm misalignment or stylistic drift. In contrast, we observe that tempo, a core property reflecting musical rhythm and pace, remains relatively consistent across datasets and genres, typically ranging from 60 to 200 BPM. Based on this finding, we propose TempoMoE, a hierarchical tempo-aware Mixture-of-Experts module that enhances the diffusion model and its rhythm perception. TempoMoE organizes motion experts into tempo-structured groups for different tempo ranges, with multi-scale beat experts capturing fine- and long-range rhythmic dynamics.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
