SepMamba: State-space models for speaker separation using Mamba

Thor H{\o}jhus Avenstrup; Boldizs\'ar Elek; Istv\'an L\'aszl\'o; M\'adi; Andr\'as Bence Schin; Morten M{\o}rup; Bj{\o}rn Sand Jensen; Kenny; Falk{\ae}r Olsen

arXiv:2410.20997·cs.SD·October 29, 2024

SepMamba: State-space models for speaker separation using Mamba

Thor H{\o}jhus Avenstrup, Boldizs\'ar Elek, Istv\'an L\'aszl\'o, M\'adi, Andr\'as Bence Schin, Morten M{\o}rup, Bj{\o}rn Sand Jensen, Kenny, Falk{\ae}r Olsen

PDF

Open Access 1 Repo

TL;DR

SepMamba is a U-Net-based speaker separation model using Mamba layers that achieves comparable or better performance than transformer models with less computational cost, suitable for practical applications.

Contribution

It introduces SepMamba, a novel architecture combining Mamba layers with U-Net for efficient speaker separation, outperforming similar-sized models including transformers.

Findings

01

Outperforms similar-sized models on WSJ0 dataset

02

Reduces computational cost, memory, and inference time

03

Effective in causal configurations

Abstract

Deep learning-based single-channel speaker separation has improved significantly in recent years largely due to the introduction of the transformer-based attention mechanism. However, these improvements come at the expense of intense computational demands, precluding their use in many practical applications. As a computationally efficient alternative with similar modeling capabilities, Mamba was recently introduced. We propose SepMamba, a U-Net-based architecture composed primarily of bidirectional Mamba layers. We find that our approach outperforms similarly-sized prominent models - including transformer-based models - on the WSJ0 2-speaker dataset while enjoying a significant reduction in computational cost, memory usage, and forward pass time. We additionally report strong results for causal variants of SepMamba. Our approach provides a computationally favorable alternative to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andrasschin/SepMamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces