Exploring how a Generative AI interprets music
Gabriela Barenboim, Luigi Del Debbio, Johannes Hirn, Veronica Sanz

TL;DR
This paper investigates how Google's MusicVAE, a generative model for music, encodes musical features like pitch, rhythm, and melody within its latent space, revealing that specific neurons encode different musical aspects.
Contribution
The study identifies and characterizes the neurons in MusicVAE's latent space that encode pitch, rhythm, and melody, providing insights into how generative models represent musical information.
Findings
Most latent neurons are silent ('noise' neurons)
Few neurons ('music neurons') encode pitch and rhythm
Melody information appears in independent neurons for longer sequences
Abstract
We use Google's MusicVAE, a Variational Auto-Encoder with a 512-dimensional latent space to represent a few bars of music, and organize the latent dimensions according to their relevance in describing music. We find that, on average, most latent neurons remain silent when fed real music tracks: we call these "noise" neurons. The remaining few dozens of latent neurons that do fire are called "music neurons". We ask which neurons carry the musical information and what kind of musical information they encode, namely something that can be identified as pitch, rhythm or melody. We find that most of the information about pitch and rhythm is encoded in the first few music neurons: the neural network has thus constructed a couple of variables that non-linearly encode many human-defined variables used to describe pitch and rhythm. The concept of melody only seems to show up in independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Music and Audio Processing · Model Reduction and Neural Networks
