MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation

Prerit Gupta; Jason Alexander Fotso-Puepi; Zhengyuan Li; Jay Mehta; Aniket Bera (Purdue University; West Lafayette; IN; USA)

arXiv:2508.16911·cs.GR·August 26, 2025

MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation

Prerit Gupta, Jason Alexander Fotso-Puepi, Zhengyuan Li, Jay Mehta, Aniket Bera (Purdue University, West Lafayette, IN, USA)

PDF

TL;DR

This paper presents MDD, a comprehensive dataset combining motion capture, music, and text annotations for advancing text-controlled and music-conditioned duet dance generation, along with baseline tasks and evaluations.

Contribution

The introduction of MDD, the first dataset integrating human motion, music, and text for duet dance generation, enabling new research in multimodal dance synthesis.

Findings

01

MDD contains 620 minutes of motion data with detailed annotations.

02

Baseline models demonstrate the feasibility of text- and music-conditioned dance generation.

03

Two novel tasks are proposed and evaluated for future research.

Abstract

We introduce Multimodal DuetDance (MDD), a diverse multimodal benchmark dataset designed for text-controlled and music-conditioned 3D duet dance motion generation. Our dataset comprises 620 minutes of high-quality motion capture data performed by professional dancers, synchronized with music, and detailed with over 10K fine-grained natural language descriptions. The annotations capture a rich movement vocabulary, detailing spatial relationships, body movements, and rhythm, making MDD the first dataset to seamlessly integrate human motions, music, and text for duet dance generation. We introduce two novel tasks supported by our dataset: (1) Text-to-Duet, where given music and a textual prompt, both the leader and follower dance motion are generated (2) Text-to-Dance Accompaniment, where given music, textual prompt, and the leader's motion, the follower's motion is generated in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.