Robust Beam Search for Encoder-Decoder Attention Based Speech   Recognition without Length Bias

Wei Zhou; Ralf Schl\"uter; Hermann Ney

arXiv:2005.09265·eess.AS·October 24, 2023·Interspeech·1 cites

Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition without Length Bias

Wei Zhou, Ralf Schl\"uter, Hermann Ney

PDF

Open Access

TL;DR

This paper introduces a novel beam search method for encoder-decoder speech recognition that explicitly models sequence length, effectively eliminating length bias and improving performance without heuristic tuning.

Contribution

The authors propose a new beam search approach based on explicit length modeling, which addresses length bias and enhances robustness in speech recognition tasks.

Findings

01

Solves length bias without heuristics or tuning

02

Achieves 4% relative WER improvement on 'other' sets

03

Provides more efficient decoding with early stopping

Abstract

As one popular modeling approach for end-to-end speech recognition, attention-based encoder-decoder models are known to suffer the length bias and corresponding beam problem. Different approaches have been applied in simple beam search to ease the problem, most of which are heuristic-based and require considerable tuning. We show that heuristics are not proper modeling refinement, which results in severe performance degradation with largely increased beam sizes. We propose a novel beam search derived from reinterpreting the sequence posterior with an explicit length modeling. By applying the reinterpreted probability together with beam pruning, the obtained final probability leads to a robust model modification, which allows reliable comparison among output sequences of different lengths. Experimental verification on the LibriSpeech corpus shows that the proposed approach solves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing