DGFM: Full Body Dance Generation Driven by Music Foundation Models
Xinran Liu, Zhenhua Feng, Diptesh Kanojia, Wenwu Wang

TL;DR
This paper introduces a diffusion-based approach for full-body dance generation driven by music, combining music foundation model features with hand-crafted features to produce realistic dance sequences aligned with music.
Contribution
The paper presents a novel diffusion-based method that integrates high-level music features from foundation models with hand-crafted features for improved dance generation.
Findings
Achieves the most realistic dance sequences
Best match with input music among tested methods
Outperforms four music foundation models
Abstract
In music-driven dance motion generation, most existing methods use hand-crafted features and neglect that music foundation models have profoundly impacted cross-modal content generation. To bridge this gap, we propose a diffusion-based method that generates dance movements conditioned on text and music. Our approach extracts music features by combining high-level features obtained by music foundation model with hand-crafted features, thereby enhancing the quality of generated dance sequences. This method effectively leverages the advantages of high-level semantic information and low-level temporal details to improve the model's capability in music feature understanding. To show the merits of the proposed method, we compare it with four music foundation models and two sets of hand-crafted music features. The results demonstrate that our method obtains the most realistic dance sequences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
