A Two-Stage Band-Split Mamba-2 Network For Music Separation
Jinglin Bai, Yuan Fang, Jiajie Wang, Xueliang Zhang

TL;DR
This paper introduces a two-stage Mamba-2 network with residual mask mapping for music source separation, demonstrating improved performance over existing methods in separating mixed music tracks.
Contribution
It proposes a novel two-stage Mamba-2 based architecture with residual mask mapping for enhanced music source separation performance.
Findings
Bidirectional Mamba-2 outperforms unidirectional models.
Two-stage network improves separation accuracy.
Residual mask mapping effectively captures missing details.
Abstract
Music source separation (MSS) aims to separate mixed music into its distinct tracks, such as vocals, bass, drums, and more. MSS is considered to be a challenging audio separation task due to the complexity of music signals. Although the RNN and Transformer architecture are not perfect, they are commonly used to model the music sequence for MSS. Recently, Mamba-2 has already demonstrated high efficiency in various sequential modeling tasks, but its superiority has not been investigated in MSS. This paper applies Mamba-2 with a two-stage strategy, which introduces residual mapping based on the mask method, effectively compensating for the details absent in the mask and further improving separation performance. Experiments confirm the superiority of bidirectional Mamba-2 and the effectiveness of the two-stage network in MSS. The source code is publicly accessible at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies
