A New Training Pipeline for an Improved Neural Transducer

Albert Zeyer; Andr\'e Merboldt; Ralf Schl\"uter; Hermann Ney

arXiv:2005.09319·eess.AS·November 20, 2020

A New Training Pipeline for an Improved Neural Transducer

Albert Zeyer, Andr\'e Merboldt, Ralf Schl\"uter, Hermann Ney

PDF

1 Repo

TL;DR

This paper introduces an improved training pipeline for neural transducers, enhancing model performance and generalization, and demonstrating superior results over attention models on speech recognition tasks.

Contribution

It proposes a new training method with full marginalization, generalizes the model and output topology, and shows improved performance on speech recognition benchmarks.

Findings

01

Transducer models outperform attention models on longer sequences.

02

The new training pipeline improves WER by over 6% on Switchboard 300h.

03

Generalization to various output topologies is demonstrated.

Abstract

The RNN transducer is a promising end-to-end model candidate. We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training. We also generalize from the original neural network model and study more powerful models, made possible due to the maximum approximation. We further generalize the output label topology to cover RNN-T, RNA and CTC. We perform several studies among all these aspects, including a study on the effect of external alignments. We find that the transducer model generalizes much better on longer sequences than the attention model. Our final transducer model outperforms our attention model on Switchboard 300h by over 6% relative WER.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rwth-i6/returnn-experiments
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.