Music Source Separation with Band-split RNN
Yi Luo, Jianwei Yu

TL;DR
This paper introduces band-split RNN, a frequency-domain model for music source separation that explicitly splits spectrograms into subbands, improving performance by leveraging domain knowledge and semi-supervised fine-tuning.
Contribution
The paper proposes a novel band-split RNN architecture with subband modeling and a semi-supervised fine-tuning pipeline, enhancing music source separation performance.
Findings
BSRNN outperforms top models on MUSDB18-HQ in MDX Challenge 2021.
Semi-supervised fine-tuning further improves separation results.
Model customization based on source characteristics enhances performance.
Abstract
The performance of music source separation (MSS) models has been greatly improved in recent years thanks to the development of novel neural network architectures and training pipelines. However, recent model designs for MSS were mainly motivated by other audio processing tasks or other research fields, while the intrinsic characteristics and patterns of the music signals were not fully discovered. In this paper, we propose band-split RNN (BSRNN), a frequency-domain model that explictly splits the spectrogram of the mixture into subbands and perform interleaved band-level and sequence-level modeling. The choices of the bandwidths of the subbands can be determined by a priori knowledge or expert knowledge on the characteristics of the target source in order to optimize the performance on a certain type of target musical instrument. To better make use of unlabeled data, we also describe a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research
