MLMA: Towards Multilingual ASR With Mamba-based Architectures

Mohamed Nabih Ali; Daniele Falavigna; Alessio Brutti

arXiv:2510.18684·cs.CL·October 24, 2025

MLMA: Towards Multilingual ASR With Mamba-based Architectures

Mohamed Nabih Ali, Daniele Falavigna, Alessio Brutti

PDF

Open Access

TL;DR

MLMA introduces the Mamba architecture for multilingual ASR, demonstrating competitive performance and improved scalability over traditional Transformer models by leveraging efficient long-context sequence processing.

Contribution

This work pioneers the use of Mamba, a state-space model, for multilingual ASR, offering a scalable and efficient alternative to Transformers with implicit language-aware features.

Findings

01

MLMA achieves competitive results on multilingual benchmarks.

02

Mamba-based models show improved efficiency and scalability.

03

Supports robust recognition across diverse languages.

Abstract

Multilingual automatic speech recognition (ASR) remains a challenging task, especially when balancing performance across high- and low-resource languages. Recent advances in sequence modeling suggest that architectures beyond Transformers may offer better scalability and efficiency. In this work, we introduce MLMA (Multilingual Language Modeling with Mamba for ASR), a new approach that leverages the Mamba architecture -- an efficient state-space model optimized for long-context sequence processing -- for multilingual ASR. Using Mamba, MLMA implicitly incorporates language-aware conditioning and shared representations to support robust recognition across diverse languages. Experiments on standard multilingual benchmarks show that MLMA achieves competitive performance compared to Transformer-based architectures. These results highlight Mamba's potential as a strong backbone for scalable,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · ICT in Developing Communities