CRoPE: Efficient Parametrization of Rotary Positional Embedding

Beicheng Lou; Zifei Xu; Vivian W. H. Wong

arXiv:2601.02728·cs.LG·April 2, 2026

CRoPE: Efficient Parametrization of Rotary Positional Embedding

Beicheng Lou, Zifei Xu, Vivian W. H. Wong

PDF

TL;DR

This paper proposes a more efficient parametrization of rotary positional embeddings in transformers, reducing parameters by nearly 50% without affecting performance.

Contribution

It introduces a complex linear transformation-based parametrization that simplifies implementation and improves parameter efficiency in rotary embeddings.

Findings

01

Reduces attention block parameters by nearly 50%.

02

Maintains model performance despite parameter reduction.

03

Provides a clearer interpretation of the representation space.

Abstract

Rotary positional embedding has become the state-of-the-art approach to encode position information in transformer-based models. While it is often succinctly expressed in complex linear algebra, we note that the actual implementation of $Q / K / V$ -projections is not equivalent to a complex linear transformation. We argue that complex linear transformation is a more natural parametrization and saves near 50\% parameters within the attention block. We show empirically that removing such redundancy has negligible impact on the model performance. Our modification achieves more efficient parameter usage, as well as a cleaner interpretation of the representation space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.