Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic

Lilit Grigoryan; Nikolay Karpov; Enas Albasiri; Vitaly Lavrukhin; Boris Ginsburg

arXiv:2507.13977·cs.CL·July 21, 2025

Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic

Lilit Grigoryan, Nikolay Karpov, Enas Albasiri, Vitaly Lavrukhin, Boris Ginsburg

PDF

Open Access

TL;DR

This paper presents new speech recognition models for Arabic, including a state-of-the-art model for Modern Standard Arabic and a unified model for both MSA and Classical Arabic, advancing accessibility and performance.

Contribution

Introduces a universal methodology and two novel FastConformer-based models for Arabic ASR, including the first unified model for MSA and Classical Arabic, with open-source resources.

Findings

01

MSA model achieves SOTA performance on benchmark datasets.

02

Unified model attains SOTA accuracy with diacritics for Classical Arabic.

03

Models are open-sourced to facilitate reproducibility.

Abstract

Despite Arabic being one of the most widely spoken languages, the development of Arabic Automatic Speech Recognition (ASR) systems faces significant challenges due to the language's complexity, and only a limited number of public Arabic ASR models exist. While much of the focus has been on Modern Standard Arabic (MSA), there is considerably less attention given to the variations within the language. This paper introduces a universal methodology for Arabic speech and text processing designed to address unique challenges of the language. Using this methodology, we train two novel models based on the FastConformer architecture: one designed specifically for MSA and the other, the first unified public model for both MSA and Classical Arabic (CA). The MSA model sets a new benchmark with state-of-the-art (SOTA) performance on related datasets, while the unified model achieves SOTA accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing