Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models
Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul, Kumar, Chanwoo Kim

TL;DR
This paper introduces a refined multi-stage, multi-task training approach for online attention-based encoder-decoder models, significantly improving speech recognition accuracy on Librispeech data.
Contribution
It proposes a novel three-stage training strategy combined with multi-task learning and transfer learning, enhancing model performance over previous methods.
Findings
35% and 10% relative WER reduction for smaller and bigger models
Achieved WER of 5.04% and 4.48% on Librispeech test-clean
Effective use of transfer learning and multi-task training strategies
Abstract
In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show 35% and 10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsByte Pair Encoding
