InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

Yosuke Higuchi; Tetsuji Ogawa; Tetsunori Kobayashi; Shinji Watanabe

arXiv:2211.00795·eess.AS·March 20, 2023

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

PDF

Open Access 1 Repo

TL;DR

InterMPL introduces intermediate supervision into momentum pseudo-labeling for CTC-based speech recognition, significantly improving semi-supervised ASR performance by relaxing the independence assumption.

Contribution

The paper proposes a novel intermediate loss mechanism for MPL, enhancing CTC-based semi-supervised ASR by explicitly relaxing the independence assumption.

Findings

01

Up to 12.1% absolute performance gain in ASR accuracy.

02

Intermediate loss significantly improves MPL effectiveness.

03

Enhanced CTC models outperform traditional MPL in semi-supervised settings.

Abstract

This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recognition (ASR) that performs pseudo-labeling (PL) with intermediate supervision. Momentum PL (MPL) trains a connectionist temporal classification (CTC)-based model on unlabeled data by continuously generating pseudo-labels on the fly and improving their quality. In contrast to autoregressive formulations, such as the attention-based encoder-decoder and transducer, CTC is well suited for MPL, or PL-based semi-supervised ASR in general, owing to its simple/fast inference algorithm and robustness against generating collapsed labels. However, CTC generally yields inferior performance than the autoregressive models due to the conditional independence assumption, thereby limiting the performance of MPL. We propose to enhance MPL by introducing intermediate loss, inspired by the recent advances in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yosukehiguchi/espnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing