TeMuDance: Contrastive Alignment-Based Textual Control for Music-Driven Dance Generation

Xinran Liu; Diptesh Kanojia; Wenwu Wang; Zhenhua Feng

arXiv:2604.17005·cs.CV·April 21, 2026

TeMuDance: Contrastive Alignment-Based Textual Control for Music-Driven Dance Generation

Xinran Liu, Diptesh Kanojia, Wenwu Wang, Zhenhua Feng

PDF

TL;DR

TeMuDance is a novel framework enabling natural language control over music-driven dance generation by aligning disjoint datasets in a shared semantic space, without needing manually annotated triplets.

Contribution

It introduces a motion-centered bridging paradigm and a lightweight text control branch to enhance semantic controllability in dance generation without extensive labeled data.

Findings

01

TeMuDance achieves high-quality dance generation with improved text control.

02

The framework effectively aligns music, text, and motion in a shared embedding space.

03

Experimental results show competitive dance quality and enhanced semantic controllability.

Abstract

Existing music-driven dance generation approaches have achieved strong realism and effective audio-motion alignment. However, they generally lack semantic controllability, making it difficult to guide specific movements through natural language descriptions. This limitation primarily stems from the absence of large-scale datasets that jointly align music, text, and motion for supervised learning of text-conditioned control. To address this challenge, we propose TeMuDance, a framework that enables text-based control for music-conditioned dance generation without requiring any manually annotated music-text-motion triplet dataset. TeMuDance introduces a motion-centred bridging paradigm that leverages motion as a shared semantic anchor to align disjoint music-dance and text-motion datasets within a unified embedding space, enabling cross-modal retrieval of missing modalities for end-to-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.