Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
Alain Riou, Antonin Gagner\'e, Ga\"etan Hadjeres, Stefan Lattner and, Geoffroy Peeters

TL;DR
This paper introduces a novel zero-shot musical stem retrieval method using joint-embedding predictive architectures, outperforming previous baselines and demonstrating strong temporal and local information retention.
Contribution
The paper presents a new joint-embedding predictive architecture for zero-shot musical stem retrieval, with a conditioning mechanism for arbitrary instruments and contrastive pretraining for improved performance.
Findings
Significantly outperforms previous baselines on MUSDB18 and MoisesDB datasets.
Pretraining with contrastive learning enhances retrieval accuracy.
Embeddings retain temporal structure and local information, useful for beat tracking.
Abstract
In this paper, we tackle the task of musical stem retrieval. Given a musical mix, it consists in retrieving a stem that would fit with it, i.e., that would sound pleasant if played together. To do so, we introduce a new method based on Joint-Embedding Predictive Architectures, where an encoder and a predictor are jointly trained to produce latent representations of a context and predict latent representations of a target. In particular, we design our predictor to be conditioned on arbitrary instruments, enabling our model to perform zero-shot stem retrieval. In addition, we discover that pretraining the encoder using contrastive learning drastically improves the model's performance. We validate the retrieval performances of our model using the MUSDB18 and MoisesDB datasets. We show that it significantly outperforms previous baselines on both datasets, showcasing its ability to support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
MethodsContrastive Learning
