TL;DR
This paper introduces LRCM, a multimodal-guided diffusion framework for dance motion generation that achieves coherent, long-duration sequences by integrating audio, text, and motion data with a novel architecture.
Contribution
The work presents a new decoupling paradigm for dance datasets and a diffusion architecture with a Motion Temporal Mamba Module for improved long-sequence dance synthesis.
Findings
LRCM outperforms existing methods in quantitative metrics.
The framework supports diverse multimodal inputs.
LRCM generates smooth, long-duration dance sequences.
Abstract
Advances in generative models and sequence learning have greatly promoted research in dance motion generation, yet current methods still suffer from coarse semantic control and poor coherence in long sequences. In this work, we present Listen to Rhythm, Choose Movements (LRCM), a multimodal-guided diffusion framework supporting both diverse input modalities and autoregressive dance motion generation. We explore a feature decoupling paradigm for dance datasets and generalize it to the Motorica Dance dataset, separating motion capture data, audio rhythm, and professionally annotated global and local text descriptions. Our diffusion architecture integrates an audio-latent Conformer and a text-latent Cross-Conformer, and incorporates a Motion Temporal Mamba Module (MTMM) to enable smooth, long-duration autoregressive synthesis. Experimental results indicate that LRCM delivers strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
