Sound Design Strategies for Latent Audio Space Explorations Using Deep Learning Architectures
K{\i}van\c{c} Tatar, Kelsey Cotton, Daniel Bisig

TL;DR
This paper explores the use of Variational Autoencoders directly on raw audio data for sound design, enabling real-time applications and artistic control by bypassing traditional feature extraction.
Contribution
It introduces three strategies for exploring latent audio spaces with VAEs applied directly to raw audio, facilitating real-time sound design and artistic experimentation.
Findings
VAEs applied directly to raw audio enable flexible sound manipulation.
Lower computational costs allow real-time audio generation.
Strategies promote artistic exploration of latent audio spaces.
Abstract
The research in Deep Learning applications in sound and music computing have gathered an interest in the recent years; however, there is still a missing link between these new technologies and on how they can be incorporated into real-world artistic practices. In this work, we explore a well-known Deep Learning architecture called Variational Autoencoders (VAEs). These architectures have been used in many areas for generating latent spaces where data points are organized so that similar data points locate closer to each other. Previously, VAEs have been used for generating latent timbre spaces or latent spaces of symbolic music excepts. Applying VAE to audio features of timbre requires a vocoder to transform the timbre generated by the network to an audio signal, which is computationally expensive. In this work, we apply VAEs to raw audio data directly while bypassing audio feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
