Music-to-Dance Generation with Optimal Transport
Shuang Wu, Shijian Lu, Li Cheng

TL;DR
This paper introduces MDOT-Net, a novel neural network that generates realistic, diverse 3D dance choreographies from music by using optimal transport distances to improve training stability and music-dance correspondence.
Contribution
It proposes a new training framework employing optimal transport and Gromov-Wasserstein distances for music-to-dance generation, enhancing realism, diversity, and music alignment.
Findings
Generated dances are realistic and diverse.
Dance sequences match musical rhythm and style.
The method outperforms previous approaches in coherence and quality.
Abstract
Dance choreography for a piece of music is a challenging task, having to be creative in presenting distinctive stylistic dance elements while taking into account the musical theme and rhythm. It has been tackled by different approaches such as similarity retrieval, sequence-to-sequence modeling and generative adversarial networks, but their generated dance sequences are often short of motion realism, diversity and music consistency. In this paper, we propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographies from music. We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music. This gives a well defined and non-divergent training objective that mitigates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Music Technology and Sound Studies
