Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR

Han Zhu; Li Wang; Jindong Wang; Gaofeng Cheng; Pengyuan Zhang,; Yonghong Yan

arXiv:2110.04484·eess.AS·June 20, 2022

Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR

Han Zhu, Li Wang, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang,, Yonghong Yan

PDF

Open Access

TL;DR

Wav2vec-S introduces a task-specific semi-supervised pre-training method that refines self-supervised models for low-resource ASR, significantly improving performance with minimal additional pre-training time.

Contribution

The paper proposes wav2vec-S, a semi-supervised pre-training approach that enhances self-supervised models specifically for low-resource ASR tasks, outperforming previous methods.

Findings

01

Significant WER reductions on various datasets.

02

Minimal increase in pre-training time.

03

Semi-supervised pre-training closes representation gaps.

Abstract

Self-supervised pre-training could effectively improve the performance of low-resource automatic speech recognition (ASR). However, existing self-supervised pre-training are task-agnostic, i.e., could be applied to various downstream tasks. Although it enlarges the scope of its application, the capacity of the pre-trained model is not fully utilized for the ASR task, and the learned representations may not be optimal for ASR. In this work, in order to build a better pre-trained model for low-resource ASR, we propose a pre-training approach called wav2vec-S, where we use task-specific semi-supervised pre-training to refine the self-supervised pre-trained model for the ASR task thus more effectively utilize the capacity of the pre-trained model to generate task-specific representations for ASR. Experiments show that compared to wav2vec 2.0, wav2vec-S only requires a marginal increment of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing