TL;DR
This paper explores adapting a multilingual speech model for an underresourced language, Ainu, by using multilingual fine-tuning and continued pretraining, demonstrating effective strategies for low-resource language speech recognition.
Contribution
It shows that continued pretraining and multilingual fine-tuning with related languages significantly improve speech recognition for underresourced languages.
Findings
Continued pretraining reduces error rates substantially.
Multilingual fine-tuning with related languages benefits low-resource target languages.
Pretraining on related or similar languages enhances performance with limited data.
Abstract
In recent years, neural models learned through self-supervised pretraining on large scale multilingual text or speech data have exhibited promising results for underresourced languages, especially when a relatively large amount of data from related language(s) is available. While the technology has a potential for facilitating tasks carried out in language documentation projects, such as speech transcription, pretraining a multilingual model from scratch for every new language would be highly impractical. We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language, focusing on actual fieldwork data from a critically endangered tongue: Ainu. Specifically, we (i) examine the feasibility of leveraging data from similar languages also in fine-tuning; (ii) verify whether the model's performance can be improved by further pretraining on target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
