Rethinking Mamba in Speech Processing by Self-Supervised Models
Xiangyu Zhang, Jianbo Ma, Mostafa Shahin, Beena Ahmed, Julien Epps

TL;DR
This paper investigates the performance of Mamba-based models in speech processing, revealing they excel in reconstruction tasks but require additional modules for classification tasks like speech recognition, supported by information theory analysis.
Contribution
The study provides a new understanding of Mamba models' strengths and limitations in speech tasks, introducing a hypothesis and validating it through information theory and HuBERT integration.
Findings
Mamba models perform well in speech reconstruction tasks.
Additional modules are needed for speech recognition tasks.
Mutual information analysis supports the hypothesis.
Abstract
The Mamba-based model has demonstrated outstanding performance across tasks in computer vision, natural language processing, and speech processing. However, in the realm of speech processing, the Mamba-based model's performance varies across different tasks. For instance, in tasks such as speech enhancement and spectrum reconstruction, the Mamba model performs well when used independently. However, for tasks like speech recognition, additional modules are required to surpass the performance of attention-based models. We propose the hypothesis that the Mamba-based model excels in "reconstruction" tasks within speech processing. However, for "classification tasks" such as Speech Recognition, additional modules are necessary to accomplish the "reconstruction" step. To validate our hypothesis, we analyze the previous Mamba-based Speech Models from an information theory perspective.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
