Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages
Siyu Liang, Gina-Anne Levow

TL;DR
This paper evaluates fine-tuned multilingual ASR models on low-resource languages, providing practical guidelines for linguists to improve transcription efficiency despite limited data and challenging conditions.
Contribution
It benchmarks MMS and XLS-R models on diverse low-resource languages, offering insights and guidelines for effective ASR adaptation in linguistic fieldwork.
Findings
MMS performs best with extremely small datasets.
XLS-R achieves comparable results with over one hour of data.
Provides practical adaptation strategies for field linguists.
Abstract
Automatic Speech Recognition (ASR) has reached impressive accuracy for high-resource languages, yet its utility in linguistic fieldwork remains limited. Recordings collected in fieldwork contexts present unique challenges, including spontaneous speech, environmental noise, and severely constrained datasets from under-documented languages. In this paper, we benchmark the performance of two fine-tuned multilingual ASR models, MMS and XLS-R, on five typologically diverse low-resource languages with control of training data duration. Our findings show that MMS is best suited when extremely small amounts of training data are available, whereas XLS-R shows parity performance once training data exceed one hour. We provide linguistically grounded analysis for further provide insights towards practical guidelines for field linguists, highlighting reproducible ASR adaptation approaches to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLanguage and cultural evolution · Speech Recognition and Synthesis · Linguistic Variation and Morphology
