TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration
Kehong Gong, Dongze Lian, Heng Chang, Chuan Guo, Zihang Jiang, Xinxin, Zuo, Michael Bi Mi, Xinchao Wang

TL;DR
This paper introduces TM2D, a method for generating 3D dance movements guided by both music and text, using a novel VQ-VAE and cross-modal transformer to effectively combine modalities and produce realistic dance motions.
Contribution
The paper presents a new approach that integrates text and music for 3D dance generation, overcoming data limitations with a VQ-VAE and novel metrics for evaluation.
Findings
Generated dances are realistic and coherent with both music and text.
The approach maintains performance comparable to single-modality methods.
New metrics effectively evaluate motion quality and freezing issues.
Abstract
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities. Unlike existing works that generate dance movements using a single modality such as music, our goal is to produce richer dance movements guided by the instructive information provided by the text. However, the lack of paired motion data with both music and text modalities limits the ability to generate dance movements that integrate both. To alleviate this challenge, we propose to utilize a 3D human motion VQ-VAE to project the motions of the two datasets into a latent space consisting of quantized vectors, which effectively mix the motion tokens from the two datasets with different distributions for training. Additionally, we propose a cross-modal transformer to integrate text instructions into motion generation architecture for generating 3D dance movements without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Diversity and Impact of Dance
MethodsVQ-VAE
