Filter and evolve: progressive pseudo label refining for semi-supervised automatic speech recognition
Zezhong Jin, Dading Zhong, Xiao Song, Zhaoyi Liu, Naipeng Ye,, Qingcheng Zeng

TL;DR
This paper introduces a method for improving semi-supervised speech recognition by filtering out low-quality pseudo labels based on confidence scores, then iteratively refining the model to enhance accuracy.
Contribution
The paper proposes a novel pseudo label filtering and iterative refinement strategy that significantly improves speech recognition performance in semi-supervised learning.
Findings
Filtered pseudo labels lead to more accurate ASR models.
Iterative correction of pseudo labels enhances model robustness.
Experiments on LibriSpeech demonstrate superior performance.
Abstract
Fine tuning self supervised pretrained models using pseudo labels can effectively improve speech recognition performance. But, low quality pseudo labels can misguide decision boundaries and degrade performance. We propose a simple yet effective strategy to filter low quality pseudo labels to alleviate this problem. Specifically, pseudo-labels are produced over the entire training set and filtered via average probability scores calculated from the model output. Subsequently, an optimal percentage of utterances with high probability scores are considered reliable training data with trustworthy labels. The model is iteratively updated to correct the unreliable pseudo labels to minimize the effect of noisy labels. The process above is repeated until unreliable pseudo abels have been adequately corrected. Extensive experiments on LibriSpeech show that these filtered samples enable the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
