Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models
Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer,, Ralf Schl\"uter, Hermann Ney

TL;DR
This paper compares existing and new methods for estimating and suppressing the implicit language model in attention-based encoder-decoder ASR models, leading to improved integration with external language models.
Contribution
It introduces novel approaches for estimating the implicit language model directly from AED models, outperforming previous methods.
Findings
Proposed methods outperform all previous ILM estimation approaches.
Suppressing ILM by reducing model capacity or limiting context improves performance.
Joint training with an external LM further enhances ASR accuracy.
Abstract
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to this implicit LM, similarly as in the hybrid hidden Markov model approach. The implicit LM cannot be calculated efficiently in general and it is yet unclear what are the best methods to estimate it. In this work, we compare different approaches from the literature and propose several novel methods to estimate the ILM directly from the AED model. Our proposed methods outperform all previous approaches. We also investigate other methods to suppress the ILM mainly by decreasing the capacity of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
