TL;DR
EndPrompt introduces a method for extending large language models' context windows efficiently using short training sequences and a two-segment input construction, achieving high performance with less computation.
Contribution
The paper presents EndPrompt, a novel approach that enables long-context extension through sparse positional supervision without full-length training.
Findings
Achieves state-of-the-art results on LongBench with fewer resources.
Effectively extends context from 8K to 64K tokens in LLaMA models.
Surpasses full-length fine-tuning in long-context tasks.
Abstract
Extending the context window of large language models typically requires training on sequences at the target length, incurring quadratic memory and computational costs that make long-context adaptation expensive and difficult to reproduce. We propose EndPrompt, a method that achieves effective context extension using only short training sequences. The core insight is that exposing a model to long-range relative positional distances does not require constructing full-length inputs: we preserve the original short context as an intact first segment and append a brief terminal prompt as a second segment, assigning it positional indices near the target context length. This two-segment construction introduces both local and long-range relative distances within a short physical sequence while maintaining the semantic continuity of the training text--a property absent in chunk-based simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
