Learning Disentangled Representations of Timbre and Pitch for Musical   Instrument Sounds Using Gaussian Mixture Variational Autoencoders

Yin-Jyun Luo; Kat Agres; Dorien Herremans

arXiv:1906.08152·cs.LG·July 2, 2019·5 cites

Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders

Yin-Jyun Luo, Kat Agres, Dorien Herremans

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Gaussian mixture variational autoencoder framework that learns to disentangle timbre and pitch in musical instrument sounds, enabling controllable synthesis and transfer of instrument characteristics.

Contribution

The work presents a novel disentanglement approach using Gaussian mixture VAEs with separate encoders for timbre and pitch, facilitating realistic sound synthesis and transfer.

Findings

01

High accuracy in instrument and pitch classification on synthesized sounds

02

Effective disentanglement of timbre and pitch demonstrated through visualization and analysis

03

Successful timbre transfer between instruments using a single autoencoder architecture

Abstract

In this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively. For reconstruction, latent variables of timbre and pitch are sampled from corresponding mixture components, and are concatenated as the input to a decoder. We show the model efficacy by latent space visualization, and a quantitative analysis indicates the discriminability of these spaces, even with a limited number of instrument labels for training. The model allows for controllable synthesis of selected instrument sounds by sampling from the latent spaces. To evaluate this, we trained instrument…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yjlolo/vae-audio
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing

MethodsSolana Customer Service Number +1-833-534-1729