Learning Music-Dance Representations through Explicit-Implicit Rhythm   Synchronization

Jiashuo Yu; Junfu Pu; Ying Cheng; Rui Feng; Ying Shan

arXiv:2207.03190·cs.SD·August 11, 2023

Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization

Jiashuo Yu, Junfu Pu, Ying Cheng, Rui Feng, Ying Shan

PDF

Open Access

TL;DR

This paper introduces MuDaR, a framework for learning synchronized music and dance representations through explicit and implicit rhythm alignment, improving performance in dance classification, retrieval, and retargeting tasks.

Contribution

The novel MuDaR framework effectively models music-dance synchronization using visual and audio cues, leveraging contrastive learning for joint embedding.

Findings

01

Outperforms existing self-supervised methods significantly.

02

Effective in dance classification, music-dance retrieval, and retargeting.

03

Accurately detects and aligns audio-visual rhythms.

Abstract

Although audio-visual representation has been proved to be applicable in many downstream tasks, the representation of dancing videos, which is more specific and always accompanied by music with complex auditory contents, remains challenging and uninvestigated. Considering the intrinsic alignment between the cadent movement of dancer and music rhythm, we introduce MuDaR, a novel Music-Dance Representation learning framework to perform the synchronization of music and dance rhythms both in explicit and implicit ways. Specifically, we derive the dance rhythms based on visual appearance and motion cues inspired by the music rhythm analysis. Then the visual rhythms are temporally aligned with the music counterparts, which are extracted by the amplitude of sound intensity. Meanwhile, we exploit the implicit coherence of rhythms implied in audio and visual streams by contrastive learning. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Cancer-related molecular mechanisms research · Speech and Audio Processing