Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

TL;DR
This paper enhances momentum pseudo-labeling for speech recognition by integrating Conformer architecture and iterative pseudo-labeling with language models, leading to improved accuracy and robustness in semi-supervised learning scenarios.
Contribution
It introduces the use of Conformer architecture and iterative pseudo-labeling with language models to improve seed model quality in MPL for speech recognition.
Findings
Conformer-based MPL outperforms previous methods.
Iterative pseudo-labeling with language models improves seed quality.
Enhanced MPL shows robustness across data and domain variations.
Abstract
Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR). Our prior work proposed momentum pseudo-labeling (MPL), which performs PL-based SSL via an interaction between online and offline models, inspired by the mean teacher framework. MPL achieves remarkable results on various semi-supervised settings, showing robustness to variations in the amount of data and domain mismatch severity. However, there is further room for improving the seed model used to initialize the MPL training, as it is in general critical for a PL-based method to start training from high-quality pseudo-labels. To this end, we propose to enhance MPL by (1) introducing the Conformer architecture to boost the overall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsBatch Normalization
