Sequence-level self-learning with multiple hypotheses
Kenichi Kumatani, Dimitrios Dimitriadis, Yashesh Gaur, Robert Gmyr,, Sefik Emre Eskimez, Jinyu Li, Michael Zeng

TL;DR
This paper introduces a novel sequence-level self-learning approach using multiple hypotheses within a multi-task learning framework to improve speech recognition, especially in accent adaptation and federated learning scenarios.
Contribution
It proposes a new multi-hypothesis self-learning method for seq2seq ASR models that mitigates errors from imperfect hypotheses and enhances adaptation.
Findings
Reduced WER from 14.55% to 10.36% in accent adaptation.
Effective in federated learning scenarios.
Improves robustness against hard-decision errors.
Abstract
In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised learning difficult to consistently improve recognition performance especially in the case that multiple powerful teacher models are unavailable. In contrast to conventional unsupervised learning approaches, we adopt the \emph{multi-task learning} (MTL) framework where the -th best ASR hypothesis is used as the label of each task. The seq2seq network is updated through the MTL framework so as to find the common representation that can cover multiple hypotheses. By doing so, the effect of the \emph{hard-decision} errors can be alleviated. We first demonstrate the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling
MethodsSelf-Learning · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
