MLMA: Towards Multilingual ASR With Mamba-based Architectures
Mohamed Nabih Ali, Daniele Falavigna, Alessio Brutti

TL;DR
MLMA introduces the Mamba architecture for multilingual ASR, demonstrating competitive performance and improved scalability over traditional Transformer models by leveraging efficient long-context sequence processing.
Contribution
This work pioneers the use of Mamba, a state-space model, for multilingual ASR, offering a scalable and efficient alternative to Transformers with implicit language-aware features.
Findings
MLMA achieves competitive results on multilingual benchmarks.
Mamba-based models show improved efficiency and scalability.
Supports robust recognition across diverse languages.
Abstract
Multilingual automatic speech recognition (ASR) remains a challenging task, especially when balancing performance across high- and low-resource languages. Recent advances in sequence modeling suggest that architectures beyond Transformers may offer better scalability and efficiency. In this work, we introduce MLMA (Multilingual Language Modeling with Mamba for ASR), a new approach that leverages the Mamba architecture -- an efficient state-space model optimized for long-context sequence processing -- for multilingual ASR. Using Mamba, MLMA implicitly incorporates language-aware conditioning and shared representations to support robust recognition across diverse languages. Experiments on standard multilingual benchmarks show that MLMA achieves competitive performance compared to Transformer-based architectures. These results highlight Mamba's potential as a strong backbone for scalable,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · ICT in Developing Communities
