An Investigation of Incorporating Mamba for Speech Enhancement

Rong Chao; Wen-Huang Cheng; Moreno La Quatra; Sabato Marco Siniscalchi; Chao-Han Huck Yang; Szu-Wei Fu; Yu Tsao

arXiv:2405.06573·cs.SD·October 8, 2025·1 cites

An Investigation of Incorporating Mamba for Speech Enhancement

Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of the Mamba state-space model for speech enhancement, demonstrating competitive PESQ scores, reduced computational complexity, and effective pre-processing for ASR tasks.

Contribution

It introduces SEMamba, a novel application of Mamba for speech enhancement, achieving state-of-the-art PESQ scores and computational efficiency improvements.

Findings

01

Achieved PESQ of 3.55 on VoiceBank-DEMAND dataset.

02

Reported a new state-of-the-art PESQ of 3.69 with PCS.

03

Reduced FLOPs by approximately 12% compared to similar solutions.

Abstract

This work aims to investigate the use of a recently proposed, attention-free, scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. In particular, we employ Mamba to deploy different regression-based SE models (SEMamba) with different configurations, namely basic, advanced, causal, and non-causal. Furthermore, loss functions either based on signal-level distances or metric-oriented are considered. Experimental evidence shows that SEMamba attains a competitive PESQ of 3.55 on the VoiceBank-DEMAND dataset with the advanced, non-causal configuration. A new state-of-the-art PESQ of 3.69 is also reported when SEMamba is combined with Perceptual Contrast Stretching (PCS). Compared against Transformed-based equivalent SE solutions, a noticeable FLOPs reduction up to ~12% is observed with the advanced non-causal configurations. Finally, SEMamba can be used as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

roychao19477/semamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing