LM2D: Lyrics- and Music-Driven Dance Synthesis
Wenjie Yin, Xuejiao Zhao, Yi Yu, Hang Yin, Danica Kragic, M{\aa}rten, Bj\"orkman

TL;DR
LM2D is a novel probabilistic model that synthesizes 3D dance motions conditioned on both music and lyrics, leveraging a new dataset and outperforming music-only models in realism and diversity.
Contribution
The paper introduces LM2D, a multimodal diffusion-based architecture for dance synthesis conditioned on lyrics and music, and provides the first dataset combining these modalities.
Findings
LM2D produces realistic, diverse dance motions.
Model outperforms music-only baselines in evaluations.
Demonstrates effectiveness through objective metrics and human assessments.
Abstract
Dance typically involves professional choreography with complex movements that follow a musical rhythm and can also be influenced by lyrical content. The integration of lyrics in addition to the auditory dimension, enriches the foundational tone and makes motion generation more amenable to its semantic meanings. However, existing dance synthesis methods tend to model motions only conditioned on audio signals. In this work, we make two contributions to bridge this gap. First, we propose LM2D, a novel probabilistic architecture that incorporates a multimodal diffusion model with consistency distillation, designed to create dance conditioned on both music and lyrics in one diffusion generation step. Second, we introduce the first 3D dance-motion dataset that encompasses both music and lyrics, obtained with pose estimation technologies. We evaluate our model against music-only baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Human Motion and Animation
MethodsDiffusion
