An Investigation of Incorporating Mamba for Speech Enhancement
Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao

TL;DR
This paper explores the use of the Mamba state-space model for speech enhancement, demonstrating competitive PESQ scores, reduced computational complexity, and effective pre-processing for ASR tasks.
Contribution
It introduces SEMamba, a novel application of Mamba for speech enhancement, achieving state-of-the-art PESQ scores and computational efficiency improvements.
Findings
Achieved PESQ of 3.55 on VoiceBank-DEMAND dataset.
Reported a new state-of-the-art PESQ of 3.69 with PCS.
Reduced FLOPs by approximately 12% compared to similar solutions.
Abstract
This work aims to investigate the use of a recently proposed, attention-free, scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. In particular, we employ Mamba to deploy different regression-based SE models (SEMamba) with different configurations, namely basic, advanced, causal, and non-causal. Furthermore, loss functions either based on signal-level distances or metric-oriented are considered. Experimental evidence shows that SEMamba attains a competitive PESQ of 3.55 on the VoiceBank-DEMAND dataset with the advanced, non-causal configuration. A new state-of-the-art PESQ of 3.69 is also reported when SEMamba is combined with Perceptual Contrast Stretching (PCS). Compared against Transformed-based equivalent SE solutions, a noticeable FLOPs reduction up to ~12% is observed with the advanced non-causal configurations. Finally, SEMamba can be used as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
