Generative timbre spaces: regularizing variational auto-encoders with   perceptual metrics

Philippe Esling; Axel Chemla--Romeu-Santos; Adrien Bitton

arXiv:1805.08501·cs.SD·October 2, 2018·20 cites

Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics

Philippe Esling, Axel Chemla--Romeu-Santos, Adrien Bitton

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach using regularized Variational Auto-Encoders to create perceptually meaningful, invertible, and generative timbre spaces that can synthesize and analyze novel musical instruments.

Contribution

It adapts VAEs with perceptual regularization to produce continuous, invertible timbre spaces aligned with human perception, enabling synthesis and analysis of new instruments.

Findings

01

NSGT provides the best correlation with timbre spaces.

02

The model generalizes to novel instruments.

03

Descriptors evolve smoothly along latent dimensions.

Abstract

Timbre spaces have been used in music perception to study the perceptual relationships between instruments based on dissimilarity ratings. However, these spaces do not generalize to novel examples and do not provide an invertible mapping, preventing audio synthesis. In parallel, generative models have aimed to provide methods for synthesizing novel timbres. However, these systems do not provide an understanding of their inner workings and are usually not related to any perceptually relevant information. Here, we show that Variational Auto-Encoders (VAE) can alleviate all of these limitations by constructing generative timbre spaces. To do so, we adapt VAEs to learn an audio latent space, while using perceptual ratings from timbre studies to regularize the organization of this space. The resulting space allows us to analyze novel instruments, while being able to synthesize audio from any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

acids-ircam/variational-timbre
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies