O-1: Self-training with Oracle and 1-best Hypothesis

Murali Karthick Baskar; Andrew Rosenberg; Bhuvana Ramabhadran; Kartik; Audhkhasi

arXiv:2308.07486·cs.LG·August 16, 2023

O-1: Self-training with Oracle and 1-best Hypothesis

Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik, Audhkhasi

PDF

Open Access

TL;DR

O-1 is a novel self-training objective for speech recognition that reduces bias, unifies training and evaluation metrics, and significantly improves recognition accuracy across multiple datasets.

Contribution

The paper introduces O-1, a faster variant of EMBR, that enhances oracle hypothesis boosting and works with both supervised and unsupervised data, improving recognition performance.

Findings

01

O-1 closes 80% of the gap between actual and oracle WER on SpeechStew.

02

O-1 achieves 13-25% relative improvement over EMBR on SpeechStew datasets.

03

O-1 reduces the WER gap by 12% with respect to the oracle on in-house data.

Abstract

We introduce O-1, a new self-training objective to reduce training bias and unify training and evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum Bayes Risk (EMBR), that boosts the oracle hypothesis and can accommodate both supervised and unsupervised data. We demonstrate the effectiveness of our approach in terms of recognition on publicly available SpeechStew datasets and a large-scale, in-house data set. On Speechstew, the O-1 objective closes the gap between the actual and oracle performance by 80\% relative compared to EMBR which bridges the gap by 43\% relative. O-1 achieves 13\% to 25\% relative improvement over EMBR on the various datasets that SpeechStew comprises of, and a 12\% relative gap reduction with respect to the oracle WER over EMBR training on the in-house dataset. Overall, O-1 results in a 9\% relative improvement in WER over EMBR,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing