Semi-Supervised Speech Recognition via Graph-based Temporal   Classification

Niko Moritz; Takaaki Hori; Jonathan Le Roux

arXiv:2010.15653·cs.LG·February 17, 2021

Semi-Supervised Speech Recognition via Graph-based Temporal Classification

Niko Moritz, Takaaki Hori, Jonathan Le Roux

PDF

TL;DR

This paper introduces a graph-based temporal classification method for semi-supervised speech recognition that leverages N-best pseudo-labels, significantly improving accuracy over standard approaches by better utilizing label uncertainties.

Contribution

It proposes a novel graph-based temporal classification (GTC) objective that effectively incorporates N-best pseudo-labels in semi-supervised ASR training, enhancing label accuracy.

Findings

01

GTC outperforms standard pseudo-labeling methods.

02

Approaches near oracle-level performance with manual N-best selection.

03

Effectively exploits label uncertainties from N-best hypotheses.

Abstract

Semi-supervised learning has demonstrated promising results in automatic speech recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for unlabeled data. The effectiveness of this approach largely relies on the pseudo-label accuracy, for which typically only the 1-best ASR hypothesis is used. However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model. In this paper, we propose a generalized form of the connectionist temporal classification (CTC) objective that accepts a graph representation of the training labels. The newly proposed graph-based temporal classification (GTC) objective is applied for self-training with WFST-based supervision, which is generated from an N-best list of pseudo-labels. In this setup, GTC is used to learn not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.