CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs
Haoran Li, Sucheng Ren, Alan Yuille, Feng Wang

TL;DR
CoPE introduces a soft clipping method for Rotary Positional Embedding that enhances long context handling in Large Language Models, outperforming previous approaches and enabling scalable context lengths up to 256k tokens.
Contribution
This work presents CoPE, a simple yet effective soft clipping technique for RoPE that unifies OOD mitigation and semantic modeling, improving long context performance in LLMs.
Findings
Significant performance improvements with CoPE on long context tasks
Scalability of RoPE up to 256k tokens demonstrated
Theoretical analysis supports effectiveness of soft clipping
Abstract
Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs). While various methods have been proposed to adapt RoPE to longer contexts, their guiding principles generally fall into two categories: (1) out-of-distribution (OOD) mitigation, which scales RoPE frequencies to accommodate unseen positions, and (2) Semantic Modeling, which posits that the attention scores computed with RoPE should always prioritize semantically similar tokens. In this work, we unify these seemingly distinct objectives through a minimalist intervention, namely CoPE: soft clipping lowfrequency components of RoPE. CoPE not only eliminates OOD outliers and refines semantic signals, but also prevents spectral leakage caused by hard clipping. Extensive experiments demonstrate that simply applying our soft clipping strategy to RoPE yields significant performance gains that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗haoranli-ml/Llama-3-8B-CoPE-64k-Basemodel
- 🤗haoranli-ml/Llama-3-8B-CoPE-64k-Instructmodel· 4 dl4 dl
- 🤗haoranli-ml/Llama-3-8B-RoPE-64k-Basemodel
- 🤗haoranli-ml/Llama-3-8B-RoPE-64k-Instructmodel· 5 dl5 dl
- 🤗haoranli-ml/Llama-3-8B-HardClip-64k-Basemodel
- 🤗haoranli-ml/Llama-3-8B-HardClip-64k-Instructmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Advanced Graph Neural Networks
