Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
Wenze Ren, Haibin Wu, Yi-Cheng Lin, Xuanjun Chen, Rong Chao, Kuo-Hsuan, Hung, You-Jin Li, Wen-Yuan Ting, Hsin-Min Wang, Yu Tsao

TL;DR
This paper introduces MCMamba, an improved state-space model that effectively combines spatial and spectral features for multichannel speech enhancement, achieving state-of-the-art results on CHiME-3.
Contribution
The paper presents MCMamba, a reengineered model that integrates full-band and narrow-band spatial information with spectral features for enhanced speech processing.
Findings
MCMamba outperforms McNet in speech enhancement tasks.
Mamba excels in modeling spectral information.
The approach achieves state-of-the-art results on CHiME-3.
Abstract
In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction. Traditional methods, such as CNN or LSTM, attempt to model the temporal dynamics of full-band and sub-band spectral and spatial features. However, these approaches face limitations in fully modeling complex temporal dependencies, especially in dynamic acoustic environments. To overcome these challenges, we modify the current advanced model McNet by introducing an improved version of Mamba, a state-space model, and further propose MCMamba. MCMamba has been completely reengineered to integrate full-band and narrow-band spatial information with sub-band and full-band spectral features, providing a more comprehensive approach to modeling spatial and spectral information. Our experimental results demonstrate that MCMamba significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis
MethodsTanh Activation · Sigmoid Activation · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Long Short-Term Memory
