Autoencoders for music sound modeling: a comparison of linear, shallow, deep, recurrent and variational models
Fanny Roche (1, 2), Thomas Hueber (1), Samuel Limier (2) and, Laurent Girin (1, 3) ((1) Univ. Grenoble Alpes, CNRS, Grenoble INP,, GIPSA-lab, Grenoble, France, (2) Arturia, Meylan, France, (3) INRIA,, Perception Team, Montbonnot, France)

TL;DR
This paper compares various autoencoder architectures and PCA for music sound modeling, finding deep and recurrent autoencoders outperform PCA in reconstruction accuracy, while variational autoencoders offer a good balance of quality and latent space usability.
Contribution
It provides a systematic comparison of linear, shallow, deep, recurrent, and variational autoencoders against PCA for music spectrum compression and synthesis.
Findings
Deep and recurrent autoencoders outperform PCA in reconstruction error.
PCA outperforms shallow autoencoders in this task.
Variational autoencoders balance reconstruction quality and latent space usability.
Abstract
This study investigates the use of non-linear unsupervised dimensionality reduction techniques to compress a music dataset into a low-dimensional representation which can be used in turn for the synthesis of new sounds. We systematically compare (shallow) autoencoders (AEs), deep autoencoders (DAEs), recurrent autoencoders (with Long Short-Term Memory cells -- LSTM-AEs) and variational autoencoders (VAEs) with principal component analysis (PCA) for representing the high-resolution short-term magnitude spectrum of a large and dense dataset of music notes into a lower-dimensional vector (and then convert it back to a magnitude spectrum used for sound resynthesis). Our experiments were conducted on the publicly available multi-instrument and multi-pitch database NSynth. Interestingly and contrary to the recent literature on image processing, we can show that PCA systematically outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
