Mamba in Speech: Towards an Alternative to Self-Attention

Xiangyu Zhang; Qiquan Zhang; Hexin Liu; Tianyi Xiao; Xinyuan Qian,; Beena Ahmed; Eliathamby Ambikairajah; Haizhou Li; Julien Epps

arXiv:2405.12609·eess.AS·April 29, 2025·6 cites

Mamba in Speech: Towards an Alternative to Self-Attention

Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian,, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

PDF

Open Access 1 Repo

TL;DR

This paper investigates the application of Mamba, an alternative to self-attention, in speech processing tasks, demonstrating that bidirectional Mamba improves performance in speech recognition and enhancement over vanilla Mamba.

Contribution

It introduces the use of bidirectional Mamba in speech tasks and shows its advantages as an alternative to self-attention in Transformer models.

Findings

01

BiMamba outperforms vanilla Mamba in speech tasks.

02

Bidirectional design enhances speech processing performance.

03

BiMamba is effective as a self-attention substitute in Transformers.

Abstract

Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and computer vision tasks, but its superiority has rarely been investigated in speech signal processing. This paper explores solutions for applying Mamba to speech processing by discussing two typical speech processing tasks: speech recognition, which requires semantic and sequential information, and speech enhancement, which focuses primarily on sequential patterns. The experimental results confirm that bidirectional Mamba (BiMamba) consistently outperforms vanilla Mamba, highlighting the advantages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tonyyouyou/mamba-in-speech
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducation and Technology Integration

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout