MAGMA: Music Aligned Generative Motion Autodecoder
Sohan Anisetty, Amit Raj, James Hays

TL;DR
This paper introduces MAGMA, a novel two-step music-to-dance generation model using VQ-VAE and Transformer, achieving state-of-the-art results and enabling real-time, long, and customizable dance sequence generation.
Contribution
The paper presents a new two-step approach combining VQ-VAE and Transformer for music-to-dance generation, improving sequence length, coherence, and customization capabilities.
Findings
Achieves state-of-the-art results on music-to-motion benchmarks.
Enables real-time generation of longer dance sequences.
Allows seamless chaining and style customization of generated dances.
Abstract
Mapping music to dance is a challenging problem that requires spatial and temporal coherence along with a continual synchronization with the music's progression. Taking inspiration from large language models, we introduce a 2-step approach for generating dance using a Vector Quantized-Variational Autoencoder (VQ-VAE) to distill motion into primitives and train a Transformer decoder to learn the correct sequencing of these primitives. We also evaluate the importance of music representations by comparing naive music feature extraction using Librosa to deep audio representations generated by state-of-the-art audio compression algorithms. Additionally, we train variations of the motion generator using relative and absolute positional encodings to determine the effect on generated motion quality when generating arbitrarily long sequence lengths. Our proposed approach achieve state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Music and Audio Processing · Music Technology and Sound Studies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Byte Pair Encoding · Softmax · Dropout · Label Smoothing · Absolute Position Encodings
