Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
Anastasia Safonova, Tatiana Yudina, Emil Nadimanov, Cydnie Davenport

TL;DR
This paper develops an ASR system for the low-resource, polysynthetic Chukchi language by collecting data from open sources and training XLSR models, demonstrating promising results despite data scarcity.
Contribution
It introduces a new dataset for Chukchi and applies XLSR models to low-resource, polysynthetic languages, addressing challenges in data collection and evaluation metrics.
Findings
Collected over 21 hours of audio and 112,719 sentences of text.
Achieved good CER results with XLSR on limited data.
Highlights issues with WER metric for polysynthetic languages.
Abstract
The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language. There is no one complete corpus of the Chukchi language, so most of the work consisted in collecting audio and texts in the Chukchi language from open sources and processing them. We managed to collect 21:34:23 hours of audio recordings and 112,719 sentences (or 2,068,273 words) of text in the Chukchi language. The XLSR model was trained on the obtained data, which showed good results even with a small amount of data. Besides the fact that the Chukchi language is a low-resource language, it is also polysynthetic, which significantly complicates any automatic processing. Thus, the usual WER metric for evaluating ASR becomes less indicative for a polysynthetic language. However, the CER metric showed good results. The question of metrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Linguistics and Cultural Studies
MethodsXLSR
