TokenDance: Token-to-Token Music-to-Dance Generation with Bidirectional Mamba
Ziyue Yang, Kaixing Yang, Xulong Tang

TL;DR
TokenDance is a novel two-stage framework for music-to-dance generation that uses dual-modality tokenization and a bidirectional generator to improve realism, diversity, and efficiency in dance synthesis.
Contribution
It introduces a dual-modality tokenization approach and a Bidirectional Mamba-based generator for coherent, high-quality, and fast music-to-dance synthesis, addressing dataset limitations.
Findings
Achieves state-of-the-art performance in dance quality and speed.
Effectively captures choreography-specific structures in music and dance.
Demonstrates strong generalization to diverse music styles.
Abstract
Music-to-dance generation has broad applications in virtual reality, dance education, and digital character animation. However, the limited coverage of existing 3D dance datasets confines current models to a narrow subset of music styles and choreographic patterns, resulting in poor generalization to real-world music. Consequently, generated dances often become overly simplistic and repetitive, substantially degrading expressiveness and realism. To tackle this problem, we present TokenDance, a two-stage music-to-dance generation framework that explicitly addresses this limitation through dual-modality tokenization and efficient token-level generation. In the first stage, we discretize both dance and music using Finite Scalar Quantization, where dance motions are factorized into upper and lower-body components with kinematic-dynamic constraints, and music is decomposed into semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
