Filter and evolve: progressive pseudo label refining for semi-supervised   automatic speech recognition

Zezhong Jin; Dading Zhong; Xiao Song; Zhaoyi Liu; Naipeng Ye,; Qingcheng Zeng

arXiv:2210.16318·cs.SD·November 1, 2022

Filter and evolve: progressive pseudo label refining for semi-supervised automatic speech recognition

Zezhong Jin, Dading Zhong, Xiao Song, Zhaoyi Liu, Naipeng Ye,, Qingcheng Zeng

PDF

Open Access

TL;DR

This paper introduces a method for improving semi-supervised speech recognition by filtering out low-quality pseudo labels based on confidence scores, then iteratively refining the model to enhance accuracy.

Contribution

The paper proposes a novel pseudo label filtering and iterative refinement strategy that significantly improves speech recognition performance in semi-supervised learning.

Findings

01

Filtered pseudo labels lead to more accurate ASR models.

02

Iterative correction of pseudo labels enhances model robustness.

03

Experiments on LibriSpeech demonstrate superior performance.

Abstract

Fine tuning self supervised pretrained models using pseudo labels can effectively improve speech recognition performance. But, low quality pseudo labels can misguide decision boundaries and degrade performance. We propose a simple yet effective strategy to filter low quality pseudo labels to alleviate this problem. Specifically, pseudo-labels are produced over the entire training set and filtered via average probability scores calculated from the model output. Subsequently, an optimal percentage of utterances with high probability scores are considered reliable training data with trustworthy labels. The model is iteratively updated to correct the unreliable pseudo labels to minimize the effect of noisy labels. The process above is repeated until unreliable pseudo abels have been adequately corrected. Extensive experiments on LibriSpeech show that these filtered samples enable the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing