CRoPE: Efficient Parametrization of Rotary Positional Embedding
Beicheng Lou, Zifei Xu, Vivian W. H. Wong

TL;DR
This paper proposes a more efficient parametrization of rotary positional embeddings in transformers, reducing parameters by nearly 50% without affecting performance.
Contribution
It introduces a complex linear transformation-based parametrization that simplifies implementation and improves parameter efficiency in rotary embeddings.
Findings
Reduces attention block parameters by nearly 50%.
Maintains model performance despite parameter reduction.
Provides a clearer interpretation of the representation space.
Abstract
Rotary positional embedding has become the state-of-the-art approach to encode position information in transformer-based models. While it is often succinctly expressed in complex linear algebra, we note that the actual implementation of -projections is not equivalent to a complex linear transformation. We argue that complex linear transformation is a more natural parametrization and saves near 50\% parameters within the attention block. We show empirically that removing such redundancy has negligible impact on the model performance. Our modification achieves more efficient parameter usage, as well as a cleaner interpretation of the representation space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
