Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't
Chihiro Taguchi, David Chiang

TL;DR
This study shows that orthographic complexity negatively impacts speech recognition accuracy across multiple languages, whereas phonological complexity does not have a significant effect, highlighting the importance of orthographic factors.
Contribution
The paper provides empirical evidence that orthographic complexity affects ASR performance, using a multilingual model and diverse writing systems, which is a novel cross-linguistic analysis.
Findings
Orthographic complexity correlates with lower ASR accuracy.
Phonological complexity shows no significant correlation with ASR performance.
Multilingual fine-tuning confirms the impact of orthographic factors.
Abstract
We investigate what linguistic factors affect the performance of Automatic Speech Recognition (ASR) models. We hypothesize that orthographic and phonological complexities both degrade accuracy. To examine this, we fine-tune the multilingual self-supervised pretrained model Wav2Vec2-XLSR-53 on 25 languages with 15 writing systems, and we compare their ASR accuracy, number of graphemes, unigram grapheme entropy, logographicity (how much word/morpheme-level information is encoded in the writing system), and number of phonemes. The results demonstrate that orthographic complexities significantly correlate with low ASR accuracy, while phonological complexity shows no significant correlation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification
