Mamba2 Meets Silence: Robust Vocal Source Separation for Sparse Regions

Euiyeon Kim; Yong-Hoon Choi

arXiv:2508.14556·cs.SD·January 1, 2026

Mamba2 Meets Silence: Robust Vocal Source Separation for Sparse Regions

Euiyeon Kim, Yong-Hoon Choi

PDF

Open Access

TL;DR

This paper presents a robust vocal source separation model using Mamba2, a state space model, which outperforms existing methods in isolating vocals from music by capturing long-range dependencies efficiently.

Contribution

The paper introduces a novel Mamba2-based model with band-splitting and dual-path architecture for improved vocal separation, outperforming Transformer-based approaches.

Findings

01

Achieved a cSDR of 11.03 dB, the best reported to date.

02

Demonstrated stable performance across different input lengths.

03

Outperformed recent state-of-the-art models in vocal isolation.

Abstract

We introduce a new music source separation model tailored for accurate vocal isolation. Unlike Transformer-based approaches, which often fail to capture intermittently occurring vocals, our model leverages Mamba2, a recent state space model, to better capture long-range temporal dependencies. To handle long input sequences efficiently, we combine a band-splitting strategy with a dual-path architecture. Experiments show that our approach outperforms recent state-of-the-art models, achieving a cSDR of 11.03 dB-the best reported to date-and delivering substantial gains in uSDR. Moreover, the model exhibits stable and consistent performance across varying input lengths and vocal occurrence patterns. These results demonstrate the effectiveness of Mamba-based models for high-resolution audio processing and open up new directions for broader applications in audio research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies