Dynamic Evaluation of Neural Sequence Models
Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals

TL;DR
This paper introduces a dynamic evaluation method that adapts neural sequence models in real-time using gradient descent, significantly enhancing their ability to predict sequential data and outperforming existing adaptation techniques.
Contribution
It proposes a novel dynamic evaluation approach that improves neural sequence models by adapting to recent data, setting new state-of-the-art results on multiple datasets.
Findings
Achieved state-of-the-art perplexities on Penn Treebank and WikiText-2.
Reduced cross-entropy on text8 and Hutter Prize datasets.
Demonstrated superior performance over existing adaptation methods.
Abstract
We present methodology for using dynamic evaluation to improve neural sequence models. Models are adapted to recent history via a gradient descent based mechanism, causing them to assign higher probabilities to re-occurring sequential patterns. Dynamic evaluation outperforms existing adaptation approaches in our comparisons. Dynamic evaluation improves the state-of-the-art word-level perplexities on the Penn Treebank and WikiText-2 datasets to 51.1 and 44.3 respectively, and the state-of-the-art character-level cross-entropies on the text8 and Hutter Prize datasets to 1.19 bits/char and 1.08 bits/char respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Neural Networks and Applications · Natural Language Processing Techniques
