Where Are We At with Automatic Speech Recognition for the Bambara Language?

Seydou Diallo; Yacouba Diarra; Mamadou K. Keita; Panga Azazia Kamat\'e; Adam Bouno Kampo; Aboubacar Ouattara

arXiv:2602.09785·cs.CL·February 11, 2026

Where Are We At with Automatic Speech Recognition for the Bambara Language?

Seydou Diallo, Yacouba Diarra, Mamadou K. Keita, Panga Azazia Kamat\'e, Adam Bouno Kampo, Aboubacar Ouattara

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper establishes the first standardized benchmark for Bambara ASR, revealing current models' performance gaps and emphasizing the need for specialized approaches for underrepresented languages.

Contribution

It introduces a controlled benchmark dataset and evaluation framework for Bambara ASR, enabling consistent comparison and progress tracking in this underrepresented language.

Findings

01

Top WER achieved is 46.76%

02

Best CER achieved is 13.00%

03

Multilingual models often exceed 100% WER

Abstract

This paper introduces the first standardized benchmark for evaluating Automatic Speech Recognition (ASR) in the Bambara language, utilizing one hour of professionally recorded Malian constitutional text. Designed as a controlled reference set under near-optimal acoustic and linguistic conditions, the benchmark was used to evaluate 37 models, ranging from Bambara-trained systems to large-scale commercial models. Our findings reveal that current ASR performance remains significantly below deployment standards in a narrow formal domain; the top-performing system in terms of Word Error Rate (WER) achieved 46.76\% and the best Character Error Rate (CER) of 13.00\% was set by another model, while several prominent multilingual models exceeded 100\% WER. These results suggest that multilingual pre-training and model scaling alone are insufficient for underrepresented languages. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MALIBA-AI/bambara-asr-benchmark
dataset· 54 dl
54 dl

Videos

Where Are We at with Automatic Speech Recognition for the Bambara Language?· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · ICT in Developing Communities · Phonetics and Phonology Research