Applying Wav2vec2.0 to Speech Recognition in Various Low-resource   Languages

Cheng Yi; Jianzhong Wang; Ning Cheng; Shiyu Zhou; Bo Xu

arXiv:2012.12121·cs.CL·January 19, 2021·57 cites

Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages

Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

PDF

Open Access

TL;DR

This paper evaluates wav2vec2.0's effectiveness for low-resource speech recognition across multiple languages, demonstrating significant improvements over previous methods and highlighting the benefits of coarse-grained modeling units.

Contribution

It extends wav2vec2.0 application to diverse low-resource languages, showing its universality and effectiveness beyond English and the Librispeech dataset.

Findings

01

Over 20% relative improvement in six languages

02

English recognition gain of 52.4%

03

Coarse-grained units outperform fine-grained units

Abstract

There are several domains that own corresponding widely used feature extractors, such as ResNet, BERT, and GPT-x. These models are usually pre-trained on large amounts of unlabeled data by self-supervision and can be effectively applied to downstream tasks. In the speech domain, wav2vec2.0 starts to show its powerful representation ability and feasibility of ultra-low resource speech recognition on the Librispeech corpus, which belongs to the audiobook domain. However, wav2vec2.0 has not been examined on real spoken scenarios and languages other than English. To verify its universality over languages, we apply pre-trained models to solve low-resource speech recognition tasks in various spoken languages. We achieve more than 20% relative improvements in six languages compared with previous work. Among these languages, English achieves a gain of 52.4%. Moreover, using coarse-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsLinear Layer · Average Pooling · Attention Is All You Need · 1x1 Convolution · Linear Warmup With Linear Decay · Batch Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Adam · Dense Connections