Censer: Curriculum Semi-supervised Learning for Speech Recognition Based   on Self-supervised Pre-training

Bowen Zhang; Songjun Cao; Xiaoming Zhang; Yike Zhang; Long Ma,; Takahiro Shinozaki

arXiv:2206.08189·cs.SD·June 28, 2022

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Bowen Zhang, Songjun Cao, Xiaoming Zhang, Yike Zhang, Long Ma,, Takahiro Shinozaki

PDF

Open Access

TL;DR

Censer introduces a semi-supervised speech recognition method that effectively utilizes unlabeled data through progressive pseudo-labeling, self-supervised pre-training, and novel data management techniques, outperforming existing methods on standard datasets.

Contribution

The paper presents a new semi-supervised learning algorithm for speech recognition that combines self-supervised pre-training with progressive pseudo-labeling and data management strategies.

Findings

01

Achieves superior performance on Libri-Light and LibriSpeech datasets.

02

Effectively leverages unlabeled data with pseudo-label quality assessment.

03

Outperforms existing semi-supervised speech recognition approaches.

Abstract

Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain insufficiently studied. Besides, modern semi-supervised speech recognition algorithms either treat unlabeled data indiscriminately or filter out noisy samples with a confidence threshold. The dissimilarities among different unlabeled data are often ignored. In this paper, we propose Censer, a semi-supervised speech recognition algorithm based on self-supervised pre-training to maximize the utilization of unlabeled data. The pre-training stage of Censer adopts wav2vec2.0 and the fine-tuning stage employs an improved semi-supervised learning algorithm from slimIPL, which leverages unlabeled data progressively according to their pseudo labels' qualities. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing