Modeling the Compatibility of Stem Tracks to Generate Music Mashups
Jiawen Huang, Ju-Chiang Wang, Jordan B. L. Smith, Xuchen Song, Yuxuan, Wang

TL;DR
This paper presents a novel approach to predict the compatibility of stem tracks for music mashups using self-supervised and semi-supervised learning, leveraging source separation and automatic key/tempo matching.
Contribution
It introduces a new model training method that uses separated stems and unlabeled data to improve mashup compatibility prediction accuracy.
Findings
The model outperforms rule-based systems in objective evaluations.
Semi-supervised training enhances compatibility prediction.
Using stem signals yields better results than combined signals.
Abstract
A music mashup combines audio elements from two or more songs to create a new work. To reduce the time and effort required to make them, researchers have developed algorithms that predict the compatibility of audio elements. Prior work has focused on mixing unaltered excerpts, but advances in source separation enable the creation of mashups from isolated stems (e.g., vocals, drums, bass, etc.). In this work, we take advantage of separated stems not just for creating mashups, but for training a model that predicts the mutual compatibility of groups of excerpts, using self-supervised and semi-supervised methods. Specifically, we first produce a random mashup creation pipeline that combines stem tracks obtained via source separation, with key and tempo automatically adjusted to match, since these are prerequisites for high-quality mashups. To train a model to predict compatibility, we use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
