Giraffe: Adventures in Expanding Context Lengths in LLMs

Arka Pal; Deep Karkhanis; Manley Roberts; Samuel Dooley; Arvind; Sundararajan; Siddartha Naidu

arXiv:2308.10882·cs.AI·August 22, 2023·5 cites

Giraffe: Adventures in Expanding Context Lengths in LLMs

Arka Pal, Deep Karkhanis, Manley Roberts, Samuel Dooley, Arvind, Sundararajan, Siddartha Naidu

PDF

Open Access 1 Repo 3 Models

TL;DR

This paper surveys methods for extending large language models' context lengths, introduces new techniques and datasets for evaluation, and releases new long-context models to facilitate further research.

Contribution

It provides a comprehensive survey of context length extrapolation methods, introduces a novel truncation strategy, and releases new long-context models and evaluation datasets.

Findings

01

Linear scaling is most effective for extending context length.

02

Longer evaluation scales improve performance.

03

Truncated basis shows promising extrapolation capabilities.

Abstract

Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of which focus on modifying the system of positional encodings used in the attention mechanism to indicate where tokens or activations are located in the input sequence. We conduct a wide survey of existing methods of context length extrapolation on a base LLaMA or LLaMA 2 model, and introduce some of our own design as well -- in particular, a new truncation strategy for modifying the basis for the position encoding. We test these methods using three new evaluation tasks (FreeFormQA,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abacusai/long-context
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsFocus · Balanced Selection