Short-Context Dominance: How Much Local Context Natural Language Actually Needs?
Vala Vakilian, Zimeng Wang, Ankit Singh Rawat, Christos Thrampoulidis

TL;DR
This paper demonstrates that most sequences in long documents can be predicted accurately with only the last 96 tokens, introduces a method to detect sequences needing longer context, and proposes a decoding strategy to improve language model performance by addressing short-context bias.
Contribution
The paper introduces Distributionally Aware MCL (DaMCL), a practical proxy for minimum context length, and develops a decoding algorithm that enhances LLM performance by mitigating short-context dominance bias.
Findings
75-80% of sequences require only the last 96 tokens for accurate prediction.
DaMCL effectively detects sequences that need longer context.
Bias mitigation improves performance across tasks and models.
Abstract
We investigate the short-context dominance hypothesis: that for most sequences, a small local prefix suffices to predict their next tokens. Using large language models as statistical oracles, we measure the minimum context length (MCL) needed to reproduce accurate full-context predictions across datasets with sequences of varying lengths. For sequences with 1-7k tokens from long-context documents, we consistently find that 75-80% require only the last 96 tokens at most. Given the dominance of short-context tokens, we then ask whether it is possible to detect challenging long-context sequences for which a short local prefix does not suffice for prediction. We introduce a practical proxy to MCL, called Distributionally Aware MCL (DaMCL), that does not require knowledge of the actual next-token and is compatible with sampling strategies beyond greedy decoding. Our experiments validate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
