Minimum Word Error Rate Training for Attention-based   Sequence-to-Sequence Models

Rohit Prabhavalkar; Tara N. Sainath; Yonghui Wu; Patrick Nguyen,; Zhifeng Chen; Chung-Cheng Chiu; Anjuli Kannan

arXiv:1712.01818·cs.CL·December 6, 2017

Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models

Rohit Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick Nguyen,, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Kannan

PDF

2 Repos

TL;DR

This paper introduces a method to directly train attention-based sequence-to-sequence speech recognition models to minimize word error rate, leading to significant performance improvements over traditional training methods.

Contribution

It proposes a novel training approach that optimizes expected word error rate using N-best list approximations, matching state-of-the-art discriminative systems.

Findings

01

Achieves up to 8.2% relative WER reduction.

02

Matches performance of traditional discriminative models on voice-search.

03

Effective training method for grapheme-based attention models.

Abstract

Sequence-to-sequence models, such as attention-based models in automatic speech recognition (ASR), are typically trained to optimize the cross-entropy criterion which corresponds to improving the log-likelihood of the data. However, system performance is usually measured in terms of word error rate (WER), not log-likelihood. Traditional ASR systems benefit from discriminative sequence training which optimizes criteria such as the state-level minimum Bayes risk (sMBR) which are more closely related to WER. In the present work, we explore techniques to train attention-based models to directly minimize expected word error rate. We consider two loss functions which approximate the expected number of word errors: either by sampling from the model, or by using N-best lists of decoded hypotheses, which we find to be more effective than the sampling-based method. In experimental evaluations, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.