Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem   Compatibility Estimation

Alain Riou; Stefan Lattner; Ga\"etan Hadjeres; Michael Anslow,; Geoffroy Peeters

arXiv:2408.02514·cs.SD·August 6, 2024

Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation

Alain Riou, Stefan Lattner, Ga\"etan Hadjeres, Michael Anslow,, Geoffroy Peeters

PDF

Open Access 1 Repo

TL;DR

Stem-JEPA is a self-supervised joint-embedding architecture that predicts compatible musical stems, enabling tasks like stem retrieval, alignment, and genre estimation by learning meaningful musical features from multi-track data.

Contribution

We introduce Stem-JEPA, a novel self-supervised model for estimating stem compatibility and capturing musical features, advancing automated mixing and music analysis.

Findings

01

Effective stem retrieval on MUSDB18 dataset

02

Embeddings encode temporal alignment information

03

Representations perform well on downstream musical tasks

Abstract

This paper explores the automated process of determining stem compatibility by identifying audio recordings of single instruments that blend well with a given musical context. To tackle this challenge, we present Stem-JEPA, a novel Joint-Embedding Predictive Architecture (JEPA) trained on a multi-track dataset using a self-supervised learning approach. Our model comprises two networks: an encoder and a predictor, which are jointly trained to predict the embeddings of compatible stems from the embeddings of a given context, typically a mix of several instruments. Training a model in this manner allows its use in estimating stem compatibility - retrieving, aligning, or generating a stem to match a given mix - or for downstream tasks such as genre or key estimation, as the training paradigm requires the model to learn information related to timbre, harmony, and rhythm. We evaluate our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SonyCSLParis/Stem-JEPA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies