Efficient Matrix Implementation for Rotary Position Embedding

Chen Minqi; Zhongqi Yue; Shihao Zhang; Yun Xu; Peng Wu; kaixiang Xu; Zeyi Huang; Hanwang Zhang

arXiv:2604.09742·cs.LG·April 14, 2026

Efficient Matrix Implementation for Rotary Position Embedding

Chen Minqi, Zhongqi Yue, Shihao Zhang, Yun Xu, Peng Wu, kaixiang Xu, Zeyi Huang, Hanwang Zhang

PDF

1 Repo

TL;DR

RoME is a novel reformulation of Rotary Position Embedding that replaces vector operations with matrix transformations, significantly improving computational efficiency in Transformer models across various domains.

Contribution

It introduces RoME, a matrix-based approach to RoPE that reduces overhead and enhances hardware utilization, enabling faster Transformer implementations.

Findings

01

RoME achieves substantial acceleration over existing RoPE implementations.

02

The method simplifies implementation and improves hardware utilization.

03

Experiments demonstrate improved performance at both operator and full-model levels.

Abstract

Rotary Position Embedding (RoPE) has become a core component of modern Transformer architectures across language, vision, and 3D domains. However, existing implementations rely on vector-level split and merge operations that introduce non-negligible computational overhead, often overlooked in attention optimization. The problem is further amplified in multi-dimensional settings (e.g., 2D and 3D RoPE), where additional vector operations and uneven feature partitions degrade hardware utilization. To overcome these limitations, we propose RoME (Rotary Matrix position Embedding), a mathematically equivalent yet computationally efficient reformulation of RoPE that replaces vector operations with unified matrix transformations. RoME eliminates dimension-specific operations, simplifies implementation, and enables fused parallel execution across Cube and Vector units on modern NPUs. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitcode.com/cann/ops-transformer/blob/master/experimental/posembedding/rope_matrix/README.md
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.