Early Stage LM Integration Using Local and Global Log-Linear Combination
Wilfried Michel, Ralf Schl\"uter, Hermann Ney

TL;DR
This paper introduces a novel log-linear combination method for integrating external language models into sequence-to-sequence speech recognition models, improving performance and flexibility over traditional shallow fusion techniques.
Contribution
It proposes a per-token renormalization approach for language model integration, enabling efficient full normalization and better performance than shallow fusion.
Findings
Significant WER reduction over shallow fusion.
Persistent improvements even with different LMs post-training.
Efficient computation of normalization terms in training and testing.
Abstract
Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) for the task of automatic speech recognition. One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. Language model integration is straightforward with the clear separation of acoustic model and language model in classical HMM-based modeling. In contrast, multiple integration schemes have been proposed for attention models. In this work, we present a novel method for language model integration into implicit-alignment based sequence-to-sequence models. Log-linear model combination of acoustic and language model is performed with a per-token renormalization. This allows us to compute the full normalization term efficiently both in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
