Ethio-ASR: Joint Multilingual Speech Recognition and Language Identification for Ethiopian Languages

Badr M. Abdullah; Israel Abebe Azime; Atnafu Lambebo Tonja; Jesujoba O. Alabi; Abel Mulat Alemu; Eyob G. Hagos; Bontu Fufa Balcha; Mulubrhan A. Nerea; Debela Desalegn Yadeta; Dagnachew Mekonnen Marilign; Amanuel Temesgen Fentahun; Tadesse Kebede; Israel D. Gebru; Michael Melese Woldeyohannis; Walelign Tewabe Sewunetie; Bernd M\"obius; Dietrich Klakow

arXiv:2603.23654·cs.CL·March 26, 2026

Ethio-ASR: Joint Multilingual Speech Recognition and Language Identification for Ethiopian Languages

Badr M. Abdullah, Israel Abebe Azime, Atnafu Lambebo Tonja, Jesujoba O. Alabi, Abel Mulat Alemu, Eyob G. Hagos, Bontu Fufa Balcha, Mulubrhan A. Nerea, Debela Desalegn Yadeta, Dagnachew Mekonnen Marilign, Amanuel Temesgen Fentahun, Tadesse Kebede, Israel D. Gebru

PDF

Open Access 1 Models

TL;DR

Ethio-ASR introduces a multilingual speech recognition system for five Ethiopian languages, leveraging pre-trained encoders and achieving superior accuracy while analyzing biases and linguistic factors affecting performance.

Contribution

The paper presents a novel multilingual CTC-based ASR model for Ethiopian languages, outperforming existing models and providing detailed analysis of linguistic and bias-related factors.

Findings

01

Best model achieves 30.48% WER on WAXAL test set

02

Outperforms OmniASR with fewer parameters

03

Provides insights into gender bias and linguistic influences

Abstract

We present Ethio-ASR, a suite of multilingual CTC-based automatic speech recognition (ASR) models jointly trained on five Ethiopian languages: Amharic, Tigrinya, Oromo, Sidaama, and Wolaytta. These languages belong to the Semitic, Cushitic, and Omotic branches of the Afroasiatic family, and remain severely underrepresented in speech technology despite being spoken by the vast majority of Ethiopia's population. We train our models on the recently released WAXAL corpus using several pre-trained speech encoders and evaluate against strong multilingual baselines, including OmniASR. Our best model achieves an average WER of 30.48% on the WAXAL test set, outperforming the best OmniASR model with substantially fewer parameters. We further provide a comprehensive analysis of gender bias, the contribution of vowel length and consonant gemination to ASR errors, and the training dynamics of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
badrex/Ethio-ASR-multilingual-600M
model· 347 dl· ♡ 2
347 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research