Kaizen: Continuously improving teacher using Exponential Moving Average   for semi-supervised speech recognition

Vimal Manohar; Tatiana Likhomanenko; Qiantong Xu; Wei-Ning Hsu; Ronan; Collobert; Yatharth Saraf; Geoffrey Zweig; Abdelrahman Mohamed

arXiv:2106.07759·eess.AS·October 28, 2021

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan, Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

PDF

TL;DR

The paper introduces the Kaizen framework, which employs an EMA-updated teacher model to generate pseudo-labels for semi-supervised speech recognition, leading to significant WER reductions and effective learning with limited supervised data.

Contribution

It presents a novel continuous pseudo-labeling approach using EMA for teacher updates, applicable across different training criteria in semi-supervised speech recognition.

Findings

01

Over 10% relative WER reduction compared to standard methods

02

Effective semi-supervised learning with only 10 hours of supervised data

03

Closes the gap to fully supervised systems with large unlabeled datasets

Abstract

In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR). The proposed approach uses a teacher model which is updated as the exponential moving average (EMA) of the student model parameters. We demonstrate that it is critical for EMA to be accumulated with full-precision floating point. The Kaizen framework can be seen as a continuous version of the iterative pseudo-labeling approach for semi-supervised training. It is applicable for different training criteria, and in this paper we demonstrate its effectiveness for frame-level hybrid hidden Markov model-deep neural network (HMM-DNN) systems as well as sequence-level Connectionist Temporal Classification (CTC) based models. For large scale real-world unsupervised public videos in UK English and Italian languages the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.