Learning source-aware representations of music in a discrete latent space
Jinsung Kim, Yeong-Seok Jeong, Woosung Choi, Jaehwa Chung, Soonyoung, Jung

TL;DR
This paper introduces a novel VQ-VAE-based method to learn human-readable, source-aware music representations in a discrete latent space, enabling easier analysis, editing, and generation of musical components like basslines.
Contribution
The paper presents a new approach to encode music into a structured, source-aware discrete latent space using VQ-VAE, facilitating human interpretability and manipulation.
Findings
Latent representations are human-readable and source-aware.
Able to generate basslines by estimating discrete latent vectors.
Demonstrates improved interpretability of music representations.
Abstract
In recent years, neural network based methods have been proposed as a method that cangenerate representations from music, but they are not human readable and hardly analyzable oreditable by a human. To address this issue, we propose a novel method to learn source-awarelatent representations of music through Vector-Quantized Variational Auto-Encoder(VQ-VAE).We train our VQ-VAE to encode an input mixture into a tensor of integers in a discrete latentspace, and design them to have a decomposed structure which allows humans to manipulatethe latent vector in a source-aware manner. This paper also shows that we can generate basslines by estimating latent vectors in a discrete space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsVQ-VAE
