TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration

Kehong Gong; Dongze Lian; Heng Chang; Chuan Guo; Zihang Jiang; Xinxin; Zuo; Michael Bi Mi; Xinchao Wang

arXiv:2304.02419·cs.CV·October 3, 2023·1 cites

TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration

Kehong Gong, Dongze Lian, Heng Chang, Chuan Guo, Zihang Jiang, Xinxin, Zuo, Michael Bi Mi, Xinchao Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces TM2D, a method for generating 3D dance movements guided by both music and text, using a novel VQ-VAE and cross-modal transformer to effectively combine modalities and produce realistic dance motions.

Contribution

The paper presents a new approach that integrates text and music for 3D dance generation, overcoming data limitations with a VQ-VAE and novel metrics for evaluation.

Findings

01

Generated dances are realistic and coherent with both music and text.

02

The approach maintains performance comparable to single-modality methods.

03

New metrics effectively evaluate motion quality and freezing issues.

Abstract

We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities. Unlike existing works that generate dance movements using a single modality such as music, our goal is to produce richer dance movements guided by the instructive information provided by the text. However, the lack of paired motion data with both music and text modalities limits the ability to generate dance movements that integrate both. To alleviate this challenge, we propose to utilize a 3D human motion VQ-VAE to project the motions of the two datasets into a latent space consisting of quantized vectors, which effectively mix the motion tokens from the two datasets with different distributions for training. Additionally, we propose a cross-modal transformer to integrate text instructions into motion generation architecture for generating 3D dance movements without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Garfield-kh/TM2D
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Diversity and Impact of Dance

MethodsVQ-VAE