Regularized Forward-Backward Decoder for Attention Models
Tobias Watzel, Ludwig K\"urzinger, Lujun Li, Gerhard Rigoll

TL;DR
This paper introduces a novel training regularization technique for attention-based speech recognition models that uses a second, time-reversed decoder during training to incorporate future context, improving performance without increasing decoding complexity.
Contribution
A new regularization method employing a second decoder trained on reversed labels enhances attention model training without altering the inference process.
Findings
Consistent performance improvements on TEDLIUMv2 and LibriSpeech datasets.
No additional complexity during decoding due to training-only regularization.
Effective utilization of future context during training enhances model accuracy.
Abstract
Nowadays, attention models are one of the popular candidates for speech recognition. So far, many studies mainly focus on the encoder structure or the attention module to enhance the performance of these models. However, mostly ignore the decoder. In this paper, we propose a novel regularization technique incorporating a second decoder during the training phase. This decoder is optimized on time-reversed target labels beforehand and supports the standard decoder during training by adding knowledge from future context. Since it is only added during training, we are not changing the basic structure of the network or adding complexity during decoding. We evaluate our approach on the smaller TEDLIUMv2 and the larger LibriSpeech dataset, achieving consistent improvements on both of them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
