Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech   Recognition

Han Zhu; Dongji Gao; Gaofeng Cheng; Daniel Povey; Pengyuan Zhang,; Yonghong Yan

arXiv:2308.06547·eess.AS·August 15, 2023·2 cites

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang,, Yonghong Yan

PDF

Open Access

TL;DR

This paper introduces a novel alternative pseudo-labeling framework for semi-supervised automatic speech recognition that effectively handles noisy pseudo-labels through a generalized CTC loss, confidence-based error detection, and automatic thresholding.

Contribution

It proposes a new training objective framework that accepts alternative tokens, improves error detection with contrastive loss, and automates threshold tuning, advancing semi-supervised speech recognition.

Findings

01

Enhanced recognition accuracy with noisy pseudo-labels.

02

Effective error detection via contrastive CTC loss.

03

Automated thresholding reduces manual tuning effort.

Abstract

When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsConnectionist Temporal Classification Loss