Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders
Yin-Jyun Luo, Kat Agres, Dorien Herremans

TL;DR
This paper introduces a Gaussian mixture variational autoencoder framework that learns to disentangle timbre and pitch in musical instrument sounds, enabling controllable synthesis and transfer of instrument characteristics.
Contribution
The work presents a novel disentanglement approach using Gaussian mixture VAEs with separate encoders for timbre and pitch, facilitating realistic sound synthesis and transfer.
Findings
High accuracy in instrument and pitch classification on synthesized sounds
Effective disentanglement of timbre and pitch demonstrated through visualization and analysis
Successful timbre transfer between instruments using a single autoencoder architecture
Abstract
In this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively. For reconstruction, latent variables of timbre and pitch are sampled from corresponding mixture components, and are concatenated as the input to a decoder. We show the model efficacy by latent space visualization, and a quantitative analysis indicates the discriminability of these spaces, even with a limited number of instrument labels for training. The model allows for controllable synthesis of selected instrument sounds by sampling from the latent spaces. To evaluate this, we trained instrument…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
MethodsSolana Customer Service Number +1-833-534-1729
