Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

TL;DR
This paper evaluates wav2vec2.0's effectiveness for low-resource speech recognition across multiple languages, demonstrating significant improvements over previous methods and highlighting the benefits of coarse-grained modeling units.
Contribution
It extends wav2vec2.0 application to diverse low-resource languages, showing its universality and effectiveness beyond English and the Librispeech dataset.
Findings
Over 20% relative improvement in six languages
English recognition gain of 52.4%
Coarse-grained units outperform fine-grained units
Abstract
There are several domains that own corresponding widely used feature extractors, such as ResNet, BERT, and GPT-x. These models are usually pre-trained on large amounts of unlabeled data by self-supervision and can be effectively applied to downstream tasks. In the speech domain, wav2vec2.0 starts to show its powerful representation ability and feasibility of ultra-low resource speech recognition on the Librispeech corpus, which belongs to the audiobook domain. However, wav2vec2.0 has not been examined on real spoken scenarios and languages other than English. To verify its universality over languages, we apply pre-trained models to solve low-resource speech recognition tasks in various spoken languages. We achieve more than 20% relative improvements in six languages compared with previous work. Among these languages, English achieves a gain of 52.4%. Moreover, using coarse-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsLinear Layer · Average Pooling · Attention Is All You Need · 1x1 Convolution · Linear Warmup With Linear Decay · Batch Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Adam · Dense Connections
