Leveraging Joint Spectral and Spatial Learning with MAMBA for   Multichannel Speech Enhancement

Wenze Ren; Haibin Wu; Yi-Cheng Lin; Xuanjun Chen; Rong Chao; Kuo-Hsuan; Hung; You-Jin Li; Wen-Yuan Ting; Hsin-Min Wang; Yu Tsao

arXiv:2409.10376·eess.AS·January 15, 2025

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement

Wenze Ren, Haibin Wu, Yi-Cheng Lin, Xuanjun Chen, Rong Chao, Kuo-Hsuan, Hung, You-Jin Li, Wen-Yuan Ting, Hsin-Min Wang, Yu Tsao

PDF

Open Access

TL;DR

This paper introduces MCMamba, an improved state-space model that effectively combines spatial and spectral features for multichannel speech enhancement, achieving state-of-the-art results on CHiME-3.

Contribution

The paper presents MCMamba, a reengineered model that integrates full-band and narrow-band spatial information with spectral features for enhanced speech processing.

Findings

01

MCMamba outperforms McNet in speech enhancement tasks.

02

Mamba excels in modeling spectral information.

03

The approach achieves state-of-the-art results on CHiME-3.

Abstract

In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction. Traditional methods, such as CNN or LSTM, attempt to model the temporal dynamics of full-band and sub-band spectral and spatial features. However, these approaches face limitations in fully modeling complex temporal dependencies, especially in dynamic acoustic environments. To overcome these challenges, we modify the current advanced model McNet by introducing an improved version of Mamba, a state-space model, and further propose MCMamba. MCMamba has been completely reengineered to integrate full-band and narrow-band spatial information with sub-band and full-band spectral features, providing a more comprehensive approach to modeling spatial and spectral information. Our experimental results demonstrate that MCMamba significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis

MethodsTanh Activation · Sigmoid Activation · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Long Short-Term Memory