Neural Language Modeling With Implicit Cache Pointers
Ke Li, Daniel Povey, Sanjeev Khudanpur

TL;DR
This paper introduces a cache-inspired method for neural language models that enhances long-range dependency handling and rare word prediction without attention mechanisms, outperforming some existing models in perplexity.
Contribution
The paper presents a simpler, cache-inspired approach for neural LMs that improves long-range dependency modeling and rare word prediction without using attention or mixture structures.
Findings
Outperforms LSTM and attention-based pointer models in perplexity on Penn Treebank and WikiText-2.
More effective at predicting rare words and long-range dependencies.
Shows benefits in N-best rescoring for rare and frequent words, but limited WER reduction.
Abstract
A cache-inspired approach is proposed for neural language models (LMs) to improve long-range dependency and better predict rare words from long contexts. This approach is a simpler alternative to attention-based pointer mechanism that enables neural LMs to reproduce words from recent history. Without using attention and mixture structure, the method only involves appending extra tokens that represent words in history to the output layer of a neural LM and modifying training supervisions accordingly. A memory-augmentation unit is introduced to learn words that are particularly likely to repeat. We experiment with both recurrent neural network- and Transformer-based LMs. Perplexity evaluation on Penn Treebank and WikiText-2 shows the proposed model outperforms both LSTM and LSTM with attention-based pointer mechanism and is more effective on rare words. N-best rescoring experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
