Music to Dance as Language Translation using Sequence Models
Andr\'e Correia, Lu\'is A. Alexandre

TL;DR
This paper introduces MDLT, a sequence-to-sequence translation approach using Transformer and Mamba architectures to generate dance choreography from music, demonstrating high-quality results on robotic dance tasks.
Contribution
It presents a novel framing of choreography synthesis as a translation problem and introduces two architectures, Transformer and Mamba, for music-to-dance translation.
Findings
MDLT outperforms baseline metrics in realism and quality
Transformer and Mamba variants effectively generate dance from music
Method successfully applied to robotic arm and humanoid robots
Abstract
Synthesising appropriate choreographies from music remains an open problem. We introduce MDLT, a novel approach that frames the choreography generation problem as a translation task. Our method leverages an existing data set to learn to translate sequences of audio into corresponding dance poses. We present two variants of MDLT: one utilising the Transformer architecture and the other employing the Mamba architecture. We train our method on AIST++ and PhantomDance data sets to teach a robotic arm to dance, but our method can be applied to a full humanoid robot. Evaluation metrics, including Average Joint Error and Fr\'echet Inception Distance, consistently demonstrate that, when given a piece of music, MDLT excels at producing realistic and high-quality choreography. The code can be found at github.com/meowatthemoon/MDLT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Human Motion and Animation · Natural Language Processing Techniques
