Low-Resourced Speech Recognition for Iu Mien Language via   Weakly-Supervised Phoneme-based Multilingual Pre-training

Lukuan Dong; Donghong Qin; Fengbo Bai; Fanhua Song; Yan Liu; Chen Xu,; Zhijian Ou

arXiv:2407.13292·cs.SD·September 17, 2024

Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training

Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu,, Zhijian Ou

PDF

Open Access

TL;DR

This paper compares three low-resource ASR approaches for the Iu Mien language, finding that phoneme-based pre-training yields the best data efficiency and recognition performance with limited annotated speech data.

Contribution

It introduces and evaluates a weakly-supervised phoneme-based multilingual pre-training method for low-resource speech recognition, demonstrating its superiority over other approaches.

Findings

01

Phoneme supervision outperforms subword and self-supervision in low-resource settings.

02

Weakly-supervised multilingual pre-training achieves competitive results with less annotated data.

03

The approach is effective for the low-resourced Iu Mien language.

Abstract

The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense that the annotated speech is very limited. With less than 10 hours of transcribed Iu Mien language, this paper investigates and compares the three approaches for Iu Mien speech recognition. Our experiments are based on the recently released, three backbone models pretrained over the 10 languages from the CommonVoice dataset (CV-Lang10), which correspond to the three approaches for low-resourced ASR. It is found that phoneme supervision can achieve better results compared to subword supervision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis