Music Source Separation with Band-Split RoPE Transformer

Wei-Tsung Lu; Ju-Chiang Wang; Qiuqiang Kong; Yun-Ning Hung

arXiv:2309.02612·cs.SD·September 12, 2023·1 cites

Music Source Separation with Band-Split RoPE Transformer

Wei-Tsung Lu, Ju-Chiang Wang, Qiuqiang Kong, Yun-Ning Hung

PDF

Open Access 1 Repo 4 Models

TL;DR

This paper introduces a novel frequency-domain approach called BS-RoFormer, using band-split modules and hierarchical Transformers with RoPE for music source separation, achieving state-of-the-art results.

Contribution

The paper proposes a new Band-Split RoPE Transformer architecture for MSS, combining band-split modules and hierarchical Transformers with Rotary Position Embedding.

Findings

01

Ranked first in Sound Demixing Challenge (SDX23) MSS track.

02

Achieved 9.80 dB average SDR on MUSDB18HQ without extra data.

03

Outperformed previous methods with state-of-the-art results.

Abstract

Music source separation (MSS) aims to separate a music recording into multiple musically distinct stems, such as vocals, bass, drums, and more. Recently, deep learning approaches such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been used, but the improvement is still limited. In this paper, we propose a novel frequency-domain approach based on a Band-Split RoPE Transformer (called BS-RoFormer). BS-RoFormer relies on a band-split module to project the input complex spectrogram into subband-level representations, and then arranges a stack of hierarchical Transformers to model the inner-band as well as inter-band sequences for multi-band mask estimation. To facilitate training the model for MSS, we propose to use the Rotary Position Embedding (RoPE). The BS-RoFormer system trained on MUSDB18HQ and 500 extra songs ranked the first place in the MSS track…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucidrains/BS-RoFormer
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Music and Audio Processing