Self-Supervised Disentanglement of Harmonic and Rhythmic Features in   Music Audio Signals

Yiming Wu

arXiv:2309.02796·cs.SD·September 7, 2023

Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals

Yiming Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised deep learning approach using a variational autoencoder to disentangle rhythmic and harmonic features in music audio, enabling controllable music generation and remixing.

Contribution

It presents a novel self-supervised method for disentangling rhythmic and harmonic features in music audio using vector rotation in latent space within a variational autoencoder.

Findings

01

Effective disentanglement of rhythmic and harmonic features demonstrated

02

Improved controllable music remixing capabilities shown

03

Quantitative evaluation confirms feature separation

Abstract

The aim of latent variable disentanglement is to infer the multiple informative latent representations that lie behind a data generation process and is a key factor in controllable data generation. In this paper, we propose a deep neural network-based self-supervised learning method to infer the disentangled rhythmic and harmonic representations behind music audio generation. We train a variational autoencoder that generates an audio mel-spectrogram from two latent features representing the rhythmic and harmonic content. In the training phase, the variational autoencoder is trained to reconstruct the input mel-spectrogram given its pitch-shifted version. At each forward computation in the training phase, a vector rotation operation is applied to one of the latent features, assuming that the dimensions of the feature vectors are related to pitch intervals. Therefore, in the trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WuYiming6526/HARD-DAFx2023
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies