PF-D2M: A Pose-free Diffusion Model for Universal Dance-to-Music Generation
Jaekwon Im, Natalia Polouliakh, Taketo Akama

TL;DR
PF-D2M is a universal diffusion model that generates music aligned with dance videos, effectively handling multiple dancers and limited data through a progressive training strategy, achieving state-of-the-art results.
Contribution
Introduces PF-D2M, a novel dance-to-music generation model using visual features and progressive training to improve generalization and performance.
Findings
Achieves state-of-the-art dance-music alignment.
Effective in multi-dancer and non-human dancer scenarios.
Outperforms existing methods in music quality.
Abstract
Dance-to-music generation aims to generate music that is aligned with dance movements. Existing approaches typically rely on body motion features extracted from a single human dancer and limited dance-to-music datasets, which restrict their performance and applicability to real-world scenarios involving multiple dancers and non-human dancers. In this paper, we propose PF-D2M, a universal diffusion-based dance-to-music generation model that incorporates visual features extracted from dance videos. PF-D2M is trained with a progressive training strategy that effectively addresses data scarcity and generalization challenges. Both objective and subjective evaluations show that PF-D2M achieves state-of-the-art performance in dance-music alignment and music quality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
