A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond   Four Stems

Karn N. Watcharasupat; Alexander Lerch

arXiv:2406.18747·cs.SD·August 27, 2024·1 cites

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

Karn N. Watcharasupat, Alexander Lerch

PDF

Open Access 1 Repo

TL;DR

Banquet is a novel single-decoder system for music source separation that efficiently handles multiple stems beyond traditional four-stem setups, supporting diverse instruments with high performance and low complexity.

Contribution

It introduces a query-based, stem-agnostic source separation model that extends bandsplit techniques with a music instrument recognition component, enabling flexible and scalable separation.

Findings

01

Approaches performance of complex models on VDBO stems

02

Outperforms on guitar and piano separation

03

Supports extraction of rare instrument stems

Abstract

Despite significant recent progress across multiple subtasks of audio source separation, few music source separation systems support separation beyond the four-stem vocals, drums, bass, and other (VDBO) setup. Of the very few current systems that support source separation beyond this setup, most continue to rely on an inflexible decoder setup that can only support a fixed pre-defined set of stems. Increasing stem support in these inflexible systems correspondingly requires increasing computational complexity, rendering extensions of these systems computationally infeasible for long-tail instruments. In this work, we propose Banquet, a system that allows source separation of multiple stems using just one decoder. A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model. On the MoisesDB dataset, Banquet, at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kwatcharasupat/query-bandit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Sparse Evolutionary Training · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer