Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Yoav Gelberg; Koshi Eguchi; Takuya Akiba; Edoardo Cetin

arXiv:2512.12167·cs.CL·December 16, 2025

Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Yoav Gelberg, Koshi Eguchi, Takuya Akiba, Edoardo Cetin

PDF

Open Access

TL;DR

This paper introduces DroPE, a simple method that removes positional embeddings after pretraining, enabling language models to extend their context seamlessly without additional finetuning.

Contribution

The paper demonstrates that positional embeddings can be dropped post-pretraining, allowing models to generalize to longer sequences without finetuning, challenging prior assumptions.

Findings

01

DroPE enables zero-shot extension of context length.

02

Models maintain performance without long-context finetuning.

03

Outperforms previous methods in extending context length.

Abstract

So far, expensive finetuning beyond the pretraining sequence length has been a requirement for effectively extending the context of language models (LM). In this work, we break this key bottleneck by Dropping the Positional Embeddings of LMs after training (DroPE). Our simple method is motivated by three key theoretical and empirical observations. First, positional embeddings (PEs) serve a crucial role during pretraining, providing an important inductive bias that significantly facilitates convergence. Second, over-reliance on this explicit positional information is also precisely what prevents test-time generalization to sequences of unseen length, even when using popular PE-scaling methods. Third, positional embeddings are not an inherent requirement of effective language modeling and can be safely removed after pretraining, following a short recalibration phase. Empirically, DroPE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis