Information-Weighted Neural Cache Language Models for ASR
Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq

TL;DR
This paper introduces neural cache language models that incorporate content relevance, demonstrating significant improvements in perplexity and WER over traditional models in ASR tasks.
Contribution
It proposes neural cache models with information-weighted interpolation and content word caching, outperforming previous neural cache approaches in perplexity and ASR WER.
Findings
29.9%/32.1% perplexity reduction on WikiText-2
Significant WER reductions on WSJ ASR
Outperforms previous neural cache models
Abstract
Neural cache language models (LMs) extend the idea of regular cache language models by making the cache probability dependent on the similarity between the current context and the context of the words in the cache. We make an extensive comparison of 'regular' cache models with neural cache models, both in terms of perplexity and WER after rescoring first-pass ASR results. Furthermore, we propose two extensions to this neural cache model that make use of the content value/information weight of the word: firstly, combining the cache probability and LM probability with an information-weighted interpolation and secondly, selectively adding only content words to the cache. We obtain a 29.9%/32.1% (validation/test set) relative improvement in perplexity with respect to a baseline LSTM LM on the WikiText-2 dataset, outperforming previous work on neural cache LMs. Additionally, we observe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsNeural Cache · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
