Omni-directional attention mechanism based on Mamba for speech separation

Ke Xue; Chang Sun; Rongfei Fan; Jing Wang; Han Hu

arXiv:2601.16603·cs.SD·January 26, 2026

Omni-directional attention mechanism based on Mamba for speech separation

Ke Xue, Chang Sun, Rongfei Fan, Jing Wang, Han Hu

PDF

Open Access

TL;DR

This paper introduces an omni-directional attention mechanism based on Mamba that models global dependencies in spectrograms for speech separation, achieving state-of-the-art performance with linear complexity.

Contribution

It proposes a novel omni-directional attention mechanism built on Mamba, enabling global 2D spectrogram modeling for improved speech separation.

Findings

01

Significant performance improvements over baseline models.

02

Outperforms existing state-of-the-art speech separation systems.

03

Maintains linear computational complexity.

Abstract

Mamba, a selective state-space model (SSM), has emerged as an efficient alternative to Transformers for speech modeling, enabling long-sequence processing with linear complexity. While effective in speech separation, existing approaches, whether in the time or time-frequency domain, typically decompose the input along a single dimension into short one-dimensional sequences before processing them with Mamba, which restricts it to local 1D modeling and limits its ability to capture global dependencies across the 2D spectrogram. In this work, we propose an efficient omni-directional attention (OA) mechanism built upon unidirectional Mamba, which models global dependencies from ten different directions on the spectrogram. We expand the proposed mechanism into two baseline separation models and evaluate on three public datasets. Experimental results show that our approach consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis