Efficient Sequence Training of Attention Models using Approximative   Recombination

Nils-Philipp Wynands; Wilfried Michel; Jan Rosendahl; Ralf; Schl\"uter; Hermann Ney

arXiv:2110.09245·cs.CL·April 22, 2022

Efficient Sequence Training of Attention Models using Approximative Recombination

Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf, Schl\"uter, Hermann Ney

PDF

Open Access

TL;DR

This paper introduces an approximative recombination method during beam search to enable efficient sequence discriminative training of attention models, significantly increasing effective beam size without high computational costs, demonstrated on LibriSpeech.

Contribution

It proposes a novel hypothesis recombination technique during beam search for sequence training, improving efficiency and scalability of training attention-based models.

Findings

01

Effective increase in beam size by several orders of magnitude

02

Maintains computational efficiency during sequence training

03

Achieves competitive results on LibriSpeech

Abstract

Sequence discriminative training is a great tool to improve the performance of an automatic speech recognition system. It does, however, necessitate a sum over all possible word sequences, which is intractable to compute in practice. Current state-of-the-art systems with unlimited label context circumvent this problem by limiting the summation to an n-best list of relevant competing hypotheses obtained from beam search. This work proposes to perform (approximative) recombinations of hypotheses during beam search, if they share a common local history. The error that is incurred by the approximation is analyzed and it is shown that using this technique the effective beam size can be increased by several orders of magnitude without significantly increasing the computational requirements. Lastly, it is shown that this technique can be used to effectively perform sequence discriminative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques