Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation
Alain Riou, Stefan Lattner, Ga\"etan Hadjeres, Michael Anslow,, Geoffroy Peeters

TL;DR
Stem-JEPA is a self-supervised joint-embedding architecture that predicts compatible musical stems, enabling tasks like stem retrieval, alignment, and genre estimation by learning meaningful musical features from multi-track data.
Contribution
We introduce Stem-JEPA, a novel self-supervised model for estimating stem compatibility and capturing musical features, advancing automated mixing and music analysis.
Findings
Effective stem retrieval on MUSDB18 dataset
Embeddings encode temporal alignment information
Representations perform well on downstream musical tasks
Abstract
This paper explores the automated process of determining stem compatibility by identifying audio recordings of single instruments that blend well with a given musical context. To tackle this challenge, we present Stem-JEPA, a novel Joint-Embedding Predictive Architecture (JEPA) trained on a multi-track dataset using a self-supervised learning approach. Our model comprises two networks: an encoder and a predictor, which are jointly trained to predict the embeddings of compatible stems from the embeddings of a given context, typically a mix of several instruments. Training a model in this manner allows its use in estimating stem compatibility - retrieving, aligning, or generating a stem to match a given mix - or for downstream tasks such as genre or key estimation, as the training paradigm requires the model to learn information related to timbre, harmony, and rhythm. We evaluate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
