TL;DR
This paper introduces MambAttention, a hybrid model combining Mamba and multi-head attention for improved generalization in single-channel speech enhancement, outperforming existing models on challenging datasets.
Contribution
The paper proposes a novel hybrid architecture, MambAttention, integrating Mamba and shared multi-head attention modules, with a new challenging dataset VB-DemandEx for training.
Findings
MambAttention outperforms state-of-the-art models on out-of-domain datasets.
Shared attention modules improve generalization performance.
Integrating attention with LSTM/xLSTM enhances cross-corpus performance.
Abstract
With new sequence models like Mamba and xLSTM, several studies have shown that these models match or outperform the state-of-the-art in single-channel speech enhancement and audio representation learning. However, prior research has demonstrated that sequence models like LSTM and Mamba tend to overfit to the training set. To address this, previous works have shown that adding self-attention to LSTMs substantially improves generalization performance for single-channel speech enhancement. Nevertheless, neither the concept of hybrid Mamba and time-frequency attention models nor their generalization performance have been explored for speech enhancement. In this paper, we propose a novel hybrid architecture, MambAttention, which combines Mamba and shared time- and frequency-multi-head attention modules for generalizable single-channel speech enhancement. To train our model, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLong Short-Term Memory · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
