Understanding the RoPE Extensions of Long-Context LLMs: An Attention   Perspective

Meizhi Zhong; Chen Zhang; Yikun Lei; Xikai Liu; Yan Gao; Yao Hu; Kehai; Chen; Min Zhang

arXiv:2406.13282·cs.CL·December 13, 2024·1 cites

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

Meizhi Zhong, Chen Zhang, Yikun Lei, Xikai Liu, Yan Gao, Yao Hu, Kehai, Chen, Min Zhang

PDF

Open Access

TL;DR

This paper provides an in-depth analysis of RoPE extensions in long-context LLMs from an attention perspective, revealing key factors that influence extrapolation performance and offering insights for improving long-text handling.

Contribution

It offers a comprehensive understanding of RoPE extensions' inner workings and demonstrates how attention patterns and pretraining lengths affect extrapolation in long-context LLMs.

Findings

01

Maintaining attention patterns at pretrained length improves extrapolation.

02

Large attention uncertainty causes retrieval errors.

03

Longer pretraining lengths reduce uncertainty and enhance extrapolation.

Abstract

Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on comparably short texts to far longer texts. A heavy bunch of efforts have been dedicated to boosting the extrapolation via extending the formulations of the RoPE, however, few of them have attempted to showcase their inner workings comprehensively. In this paper, we are driven to offer a straightforward yet in-depth understanding of RoPE extensions from an attention perspective and on two benchmarking tasks. A broad array of experiments reveals several valuable findings: 1) Maintaining attention patterns to those at the pretrained length improves extrapolation; 2) Large attention uncertainty leads to retrieval errors; 3) Using longer continual pretraining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies

MethodsSoftmax · Attention Is All You Need