Giraffe: Adventures in Expanding Context Lengths in LLMs
Arka Pal, Deep Karkhanis, Manley Roberts, Samuel Dooley, Arvind, Sundararajan, Siddartha Naidu

TL;DR
This paper surveys methods for extending large language models' context lengths, introduces new techniques and datasets for evaluation, and releases new long-context models to facilitate further research.
Contribution
It provides a comprehensive survey of context length extrapolation methods, introduces a novel truncation strategy, and releases new long-context models and evaluation datasets.
Findings
Linear scaling is most effective for extending context length.
Longer evaluation scales improve performance.
Truncated basis shows promising extrapolation capabilities.
Abstract
Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of which focus on modifying the system of positional encodings used in the attention mechanism to indicate where tokens or activations are located in the input sequence. We conduct a wide survey of existing methods of context length extrapolation on a base LLaMA or LLaMA 2 model, and introduce some of our own design as well -- in particular, a new truncation strategy for modifying the basis for the position encoding. We test these methods using three new evaluation tasks (FreeFormQA,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsFocus · Balanced Selection
