Quantized GAN for Complex Music Generation from Dance Videos

Ye Zhu; Kyle Olszewski; Yu Wu; Panos Achlioptas; Menglei Chai; Yan Yan; and Sergey Tulyakov

arXiv:2204.00604·cs.CV·July 20, 2022

Quantized GAN for Complex Music Generation from Dance Videos

Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan, and Sergey Tulyakov

PDF

Open Access 1 Repo

TL;DR

This paper introduces D2M-GAN, a novel framework that generates complex, multi-style music conditioned on dance videos using a Vector Quantized audio representation, outperforming existing methods in realism and diversity.

Contribution

The paper presents a new multi-modal GAN framework for dance-conditioned music generation employing VQ audio, with extensive experiments and a new TikTok dataset demonstrating its effectiveness.

Findings

01

Quantitative results show high music consistency and beat correspondence.

02

The method generates diverse music styles including pop and breaking.

03

The TikTok dataset enables real-world application testing.

Abstract

We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates complex musical samples conditioned on dance videos. Our proposed framework takes dance video frames and human body motions as input, and learns to generate music samples that plausibly accompany the corresponding input. Unlike most existing conditional music generation works that generate specific types of mono-instrumental sounds using symbolic audio representations (e.g., MIDI), and that usually rely on pre-defined musical synthesizers, in this work we generate dance music in complex styles (e.g., pop, breaking, etc.) by employing a Vector Quantized (VQ) audio representation, and leverage both its generality and high abstraction capacity of its symbolic and continuous counterparts. By performing an extensive set of experiments on multiple datasets, and following a comprehensive evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

l-yezhu/d2m-gan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Human Motion and Animation