Mel-Band RoFormer for Music Source Separation

Ju-Chiang Wang; Wei-Tsung Lu; Minz Won

arXiv:2310.01809·cs.SD·October 4, 2023

Mel-Band RoFormer for Music Source Separation

Ju-Chiang Wang, Wei-Tsung Lu, Minz Won

PDF

Open Access 2 Repos 1 Models

TL;DR

This paper introduces Mel-RoFormer, a novel multi-band music source separation model using mel-scale overlapped subbands and hierarchical Transformers, achieving state-of-the-art results on MUSDB18HQ.

Contribution

It proposes the Mel-band scheme with overlapped subbands based on the mel scale, improving upon previous non-overlapping band-split methods for music source separation.

Findings

01

Mel-RoFormer outperforms BS-RoFormer in vocals, drums, and other stems.

02

The mel-scale band scheme yields better separation performance.

03

The model achieves state-of-the-art results on MUSDB18HQ.

Abstract

Recently, multi-band spectrogram-based approaches such as Band-Split RNN (BSRNN) have demonstrated promising results for music source separation. In our recent work, we introduce the BS-RoFormer model which inherits the idea of band-split scheme in BSRNN at the front-end, and then uses the hierarchical Transformer with Rotary Position Embedding (RoPE) to model the inner-band and inter-band sequences for multi-band mask estimation. This model has achieved state-of-the-art performance, but the band-split scheme is defined empirically, without analytic supports from the literature. In this paper, we propose Mel-RoFormer, which adopts the Mel-band scheme that maps the frequency bins into overlapped subbands according to the mel scale. In contract, the band-split mapping in BSRNN and BS-RoFormer is non-overlapping and designed based on heuristics. Using the MUSDB18HQ dataset for experiments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Sucial/Dereverb-Echo_Mel_Band_Roformer
model· ♡ 20
♡ 20

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques