TL;DR
GRAPE introduces a unified group action framework for positional encoding, encompassing and extending existing methods like RoPE and ALiBi, to improve long-context modeling.
Contribution
It unifies multiplicative and additive positional encoding mechanisms under a group action framework, providing a principled design space for long-context models.
Findings
GRAPE recovers RoPE and ALiBi as special cases.
Learned subspaces extend geometry to capture feature coupling.
Project page is available at https://github.com/model-architectures/GRAPE.
Abstract
We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE unifies two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group . In Multiplicative GRAPE, a position (or ) acts as with a rank-2 skew-symmetric generator , yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the planes correspond to canonical coordinate pairs with a log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
