Improving Deliberation by Text-Only and Semi-Supervised Training

Ke Hu; Tara N. Sainath; Yanzhang He; Rohit Prabhavalkar; Trevor; Strohman; Sepand Mavandadi; Weiran Wang

arXiv:2206.14716·cs.CL·June 30, 2022

Improving Deliberation by Text-Only and Semi-Supervised Training

Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor, Strohman, Sepand Mavandadi, Weiran Wang

PDF

Open Access

TL;DR

This paper enhances an attention-based deliberation model by integrating text-only and semi-supervised training methods, leading to significant reductions in word error rate and improved human evaluation results.

Contribution

It introduces a novel approach combining text-only data and semi-supervised training into an attention-based deliberation model, improving speech recognition accuracy.

Findings

01

Achieved 4-12% WER reduction across various tasks.

02

Reduced Google Voice Search WER by 11% relative.

03

Positive human evaluation compared to state-of-the-art LM rescoring.

Abstract

Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data. In this work, we propose incorporating text-only and semi-supervised training into an attention-based deliberation model. By incorporating text-only data in training a bidirectional encoder representation from transformer (BERT) for the deliberation text encoder, and large-scale text-to-speech and audio-only utterances using joint acoustic and text decoder (JATD) and semi-supervised training, we achieved 4%-12% WER reduction for various tasks compared to the baseline deliberation. Compared to a state-of-the-art language model (LM) rescoring method, the deliberation model reduces the Google Voice Search WER by 11% relative. We show that the deliberation model also achieves a positive human side-by-side evaluation compared to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Speech and dialogue systems