Go Forth and Prosper: Language Modeling with Ancient Textual History
Rik Koncel-Kedziorski, Noah A. Smith

TL;DR
This paper presents a novel method to enhance document-level language models by selectively incorporating relevant historical text outside the current context window, significantly reducing perplexity across different domains without updating the model parameters.
Contribution
The authors introduce an auxiliary function to select and integrate historical text spans into the language model's context, improving performance without retraining the model.
Findings
7% perplexity reduction on Wikipedia articles
12% perplexity reduction on scientific texts
Method works across different textual domains
Abstract
We introduce a technique for improving document-level language models (LM) by leveraging "ancient history": text that is outside the LM's current context window. We learn an auxiliary function to select spans from the ancient history which can help the LM to predict future text. The selected text spans are then copied directly into the LM's context window, replacing less predictive spans. This method can improve perplexity of pretrained LMs with no updates to the LM's own parameters. We further observe that an auxiliary function trained in a specific textual domain like Wikipedia will also work in a substantially different domain such as scientific publications. With this technique we see a 7 percent perplexity reduction on Wikipedia articles, and a 12 percent perplexity reduction on scientific texts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
