TL;DR
RoME is a novel reformulation of Rotary Position Embedding that replaces vector operations with matrix transformations, significantly improving computational efficiency in Transformer models across various domains.
Contribution
It introduces RoME, a matrix-based approach to RoPE that reduces overhead and enhances hardware utilization, enabling faster Transformer implementations.
Findings
RoME achieves substantial acceleration over existing RoPE implementations.
The method simplifies implementation and improves hardware utilization.
Experiments demonstrate improved performance at both operator and full-model levels.
Abstract
Rotary Position Embedding (RoPE) has become a core component of modern Transformer architectures across language, vision, and 3D domains. However, existing implementations rely on vector-level split and merge operations that introduce non-negligible computational overhead, often overlooked in attention optimization. The problem is further amplified in multi-dimensional settings (e.g., 2D and 3D RoPE), where additional vector operations and uneven feature partitions degrade hardware utilization. To overcome these limitations, we propose RoME (Rotary Matrix position Embedding), a mathematically equivalent yet computationally efficient reformulation of RoPE that replaces vector operations with unified matrix transformations. RoME eliminates dimension-specific operations, simplifies implementation, and enables fused parallel execution across Cube and Vector units on modern NPUs. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
