TL;DR
This paper introduces Selective RoPE, an input-dependent rotary position embedding mechanism that generalizes existing methods and improves language modeling and sequence task performance.
Contribution
It proposes Selective RoPE, enabling arbitrary-angle rotations in transformers, revealing implicit positional structures, and enhancing performance on complex sequence tasks.
Findings
Selective RoPE improves language modeling accuracy.
It enhances performance on copying, state tracking, and retrieval tasks.
Softmax attention implicitly performs rotations on query-key pairs.
Abstract
Position information is essential for language modeling. In softmax transformers, Rotary Position Embeddings (\textit{RoPE}) encode positions through \textit{fixed-angle} rotations, while in linear transformers, order is handled via input-dependent (selective) gating that decays past key-value associations. Selectivity has generally been shown to improve language-related tasks. Inspired by this, we introduce \textit{Selective RoPE}, an \textit{input-dependent} rotary embedding mechanism, that generalizes \textit{RoPE}, and enables rotation in \textit{arbitrary angles} for both linear and softmax transformers. We show that softmax attention already performs a hidden form of these rotations on query-key pairs, uncovering an implicit positional structure. We further show that in state-space models and gated linear transformers, the real part manages forgetting while the imaginary part…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
