Timbre latent space: exploration and creative aspects
Antoine Caillon, Adrien Bitton, Brice Gatinet, Philippe Esling

TL;DR
This paper explores the use of disentangled latent spaces in generative audio models to enhance control and creativity in timbre manipulation, with experiments involving composers and custom interfaces.
Contribution
It introduces methods for disentangling timbre representations in auto-encoders and demonstrates their application in creative sound synthesis with interactive tools.
Findings
Disentangled latent spaces improve timbre control.
Perceptual regularization aligns latent spaces with perceptual timbre dimensions.
Creative applications with interfaces enable novel sound manipulations.
Abstract
Recent studies show the ability of unsupervised models to learn invertible audio representations using Auto-Encoders. They enable high-quality sound synthesis but a limited control since the latent spaces do not disentangle timbre properties. The emergence of disentangled representations was studied in Variational Auto-Encoders (VAEs), and has been applied to audio. Using an additional perceptual regularization can align such latent representation with the previously established multi-dimensional timbre spaces, while allowing continuous inference and synthesis. Alternatively, some specific sound attributes can be learned as control variables while unsupervised dimensions account for the remaining features. New possibilities for timbre manipulations are enabled with generative neural networks, although the exploration and the creative use of their representations remain little. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
