Linguistically Informed Evaluation of Multilingual ASR for African Languages
Fei-Yueh Chen, Lateef Adeleke, C.M. Downey

TL;DR
This paper proposes linguistically-informed metrics, including FER and TER, to better evaluate multilingual ASR models for African languages by revealing phonological and tonal errors that traditional WER overlooks.
Contribution
It introduces a new evaluation framework combining FER and a tone-aware extension to expose linguistically meaningful errors in African language ASR models.
Findings
FER and TER reveal error patterns missed by WER.
Models perform better on segmental features than tones.
Tone errors, especially mid and downstep, are most challenging.
Abstract
Word Error Rate (WER) mischaracterizes ASR models' performance for African languages by combining phonological, tone, and other linguistic errors into a single lexical error. By contrast, Feature Error Rate (FER) has recently attracted attention as a viable metric that reveals linguistically meaningful errors in models' performance. In this paper, we evaluate three speech encoders on two African languages by complementing WER with CER, and FER, and add a tone-aware extension (TER). We show that by computing errors on phonological features, FER and TER reveal linguistically-salient error patterns even when word-level accuracy remains low. Our results reveal that models perform better on segmental features, while tones (especially mid and downstep) remain the most challenging features. Results on Yoruba show a striking differential in metrics, with WER=0.788, CER=0.305, and FER=0.151.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
