Omni-directional attention mechanism based on Mamba for speech separation
Ke Xue, Chang Sun, Rongfei Fan, Jing Wang, Han Hu

TL;DR
This paper introduces an omni-directional attention mechanism based on Mamba that models global dependencies in spectrograms for speech separation, achieving state-of-the-art performance with linear complexity.
Contribution
It proposes a novel omni-directional attention mechanism built on Mamba, enabling global 2D spectrogram modeling for improved speech separation.
Findings
Significant performance improvements over baseline models.
Outperforms existing state-of-the-art speech separation systems.
Maintains linear computational complexity.
Abstract
Mamba, a selective state-space model (SSM), has emerged as an efficient alternative to Transformers for speech modeling, enabling long-sequence processing with linear complexity. While effective in speech separation, existing approaches, whether in the time or time-frequency domain, typically decompose the input along a single dimension into short one-dimensional sequences before processing them with Mamba, which restricts it to local 1D modeling and limits its ability to capture global dependencies across the 2D spectrogram. In this work, we propose an efficient omni-directional attention (OA) mechanism built upon unidirectional Mamba, which models global dependencies from ten different directions on the spectrogram. We expand the proposed mechanism into two baseline separation models and evaluate on three public datasets. Experimental results show that our approach consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis
