Long Context Alignment with Short Instructions and Synthesized Positions
Wenhao Wu, Yizhong Wang, Yao Fu, Xiang Yue, Dawei Zhu, Sujian Li

TL;DR
This paper introduces SkipAlign, a novel technique that enhances large language models' ability to handle long contexts by synthesizing long-range dependencies through position-based insertion, improving performance without extra data or training effort.
Contribution
SkipAlign is a new method that improves long-context understanding in LLMs by synthesizing dependencies from position indices, not requiring additional data or training beyond original data.
Findings
SkipAlign improves long-context task performance across various models.
A 6B parameter model with SkipAlign matches GPT-3.5-Turbo-16K on LongBench.
Effective long-range dependency synthesis enhances LLM capabilities.
Abstract
Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skipping Alignment (SkipAlign), a new technique designed to enhance the long-context capabilities of LLMs in the phase of alignment without the need for additional efforts beyond training with original data length. SkipAlign is developed on the premise that long-range dependencies are fundamental to enhancing an LLM's capacity of long context. Departing from merely expanding the length of input samples, SkipAlign synthesizes long-range dependencies from the aspect of positions indices. This is achieved by the strategic insertion of skipped positions within instruction-following samples, which utilizes the semantic structure of the data to effectively expand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Speech and Audio Processing
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Weight Decay · Cosine Annealing · Attention Dropout · Dropout · Linear Warmup With Cosine Annealing · Residual Connection · Softmax
