Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation
Zoey Liu, Justin Spence, Emily Prud'hommeaux

TL;DR
This study evaluates ten data partitioning strategies for low-resource crosslinguistic ASR, revealing that speaker selection significantly impacts performance metrics and that random splits provide more reliable estimates than hold-speaker-out methods.
Contribution
It systematically compares various data split methods for low-resource languages, highlighting the limitations of hold-speaker-out strategies and advocating for random splits for better generalization.
Findings
Model performance varies greatly with speaker selection.
Average WER over all speakers is comparable to random splits.
Random splits yield more reliable performance estimates.
Abstract
Many automatic speech recognition (ASR) data sets include a single pre-defined test set consisting of one or more speakers whose speech never appears in the training set. This "hold-speaker(s)-out" data partitioning strategy, however, may not be ideal for data sets in which the number of speakers is very small. This study investigates ten different data split methods for five languages with minimal ASR training resources. We find that (1) model performance varies greatly depending on which speaker is selected for testing; (2) the average word error rate (WER) across all held-out speakers is comparable not only to the average WER over multiple random splits but also to any given individual random split; (3) WER is also generally comparable when the data is split heuristically or adversarially; (4) utterance duration and intensity are comparatively more predictive factors of variability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Topic Modeling
MethodsTest
