MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and   Correspondence

Fuming You; Minghui Fang; Li Tang; Rongjie Huang; Yongqi Wang; Zhou; Zhao

arXiv:2411.01805·cs.SD·November 5, 2024

MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence

Fuming You, Minghui Fang, Li Tang, Rongjie Huang, Yongqi Wang, Zhou, Zhao

PDF

Open Access 1 Video

TL;DR

MoMu-Diffusion introduces a unified framework for long-term, synchronized motion-music generation using a novel auto-encoder and diffusion model, enabling diverse cross-modal and variable-length synthesis with improved realism.

Contribution

The paper presents a novel BiCoR-VAE for efficient modality-aligned representation learning and a multi-modal diffusion model for synchronized motion-music generation, addressing long-term sequence challenges.

Findings

01

Outperforms state-of-the-art methods in quality and diversity

02

Capable of long-term, beat-matched motion and music synthesis

03

Effective in cross-modal and multi-modal generation tasks

Abstract

Motion-to-music and music-to-motion have been studied separately, each attracting substantial research interest within their respective domains. The interaction between human motion and music is a reflection of advanced human intelligence, and establishing a unified relationship between them is particularly important. However, to date, there has been no work that considers them jointly to explore the modality alignment within. To bridge this gap, we propose a novel framework, termed MoMu-Diffusion, for long-term and synchronous motion-music generation. Firstly, to mitigate the huge computational costs raised by long sequences, we propose a novel Bidirectional Contrastive Rhythmic Variational Auto-Encoder (BiCoR-VAE) that extracts the modality-aligned latent representations for both motion and music inputs. Subsequently, leveraging the aligned latent spaces, we introduce a multi-modal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence· slideslive

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing