MATHDance: Mamba-Transformer Architecture with Uniform Tokenization for High-Quality 3D Dance Generation

Kaixing Yang; Xulong Tang; Ziqiao Peng; Yuxuan Hu; Xiangyue Zhang; Puwei Wang; Hongyan Liu; Jun He; Zhaoxin Fan

arXiv:2505.14222·cs.SD·April 2, 2026

MATHDance: Mamba-Transformer Architecture with Uniform Tokenization for High-Quality 3D Dance Generation

Kaixing Yang, Xulong Tang, Ziqiao Peng, Yuxuan Hu, Xiangyue Zhang, Puwei Wang, Hongyan Liu, Jun He, Zhaoxin Fan

PDF

TL;DR

MATHDance introduces a novel two-stage framework combining a kinematic-dynamic quantization and a Mamba-Transformer architecture to generate high-quality, choreographically consistent 3D dance motions from music.

Contribution

It proposes a new hybrid architecture and quantization method that significantly improve music-to-dance generation quality and consistency.

Findings

01

Achieves state-of-the-art performance on the FineDance dataset.

02

Effectively encodes dance motions with high fidelity using KDQS.

03

Demonstrates improved choreographic consistency in generated dances.

Abstract

Music-to-dance generation represents a challenging yet pivotal task at the intersection of choreography, virtual reality, and creative content generation. Despite its significance, existing methods face substantial limitation in achieving choreographic consistency. To address the challenge, we propose MatchDance, a novel framework for music-to-dance generation that constructs a latent representation to enhance choreographic consistency. MatchDance employs a two-stage design: (1) a Kinematic-Dynamic-based Quantization Stage (KDQS), which encodes dance motions into a latent representation by Finite Scalar Quantization (FSQ) with kinematic-dynamic constraints and reconstructs them with high fidelity, and (2) a Hybrid Music-to-Dance Generation Stage(HMDGS), which uses a Mamba-Transformer hybrid architecture to map music into the latent representation, followed by the KDQS decoder to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.