Spiral RoPE: Rotate Your Rotary Positional Embeddings in the 2D Plane
Haoyu Liu, Sucheng Ren, Tingyu Zhu, Peng Wang, Cihang Xie, Alan Yuille, Zeyu Zheng, Feng Wang

TL;DR
Spiral RoPE extends rotary positional embeddings to encode multi-directional spatial relationships in vision transformers, overcoming the axis-aligned limitation of standard axial RoPE and improving performance across vision tasks.
Contribution
It introduces Spiral RoPE, a novel multi-directional positional encoding method that enhances the modeling of oblique spatial relationships in vision transformers.
Findings
Improves performance in classification, segmentation, and generation tasks.
Produces more concentrated attention maps on relevant objects.
Better respects local object boundaries in visual data.
Abstract
Rotary Position Embedding (RoPE) is the de facto positional encoding in large language models due to its ability to encode relative positions and support length extrapolation. When adapted to vision transformers, the standard axial formulation decomposes two-dimensional spatial positions into horizontal and vertical components, implicitly restricting positional encoding to axis-aligned directions. We identify this directional constraint as a fundamental limitation of the standard axial 2D RoPE, which hinders the modeling of oblique spatial relationships that naturally exist in natural images. To overcome this limitation, we propose Spiral RoPE, a simple yet effective extension that enables multi-directional positional encoding by partitioning embedding channels into multiple groups associated with uniformly distributed directions. Each group is rotated according to the projection of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices
