GeoPE:A Unified Geometric Positional Embedding for Structured Tensors
Yupu Yao, Bowen Yang

TL;DR
GeoPE introduces a novel 3D geometric positional embedding using quaternions, effectively preserving spatial topology in vision transformers and improving performance across multiple vision tasks.
Contribution
It proposes a unified 3D rotational embedding framework that better captures spatial relationships by extending rotations to 3D space with quaternions and ensuring symmetry.
Findings
Outperforms existing 2D RoPE variants in experiments
Enhances shape bias in vision models
Improves accuracy in image classification, detection, and segmentation
Abstract
Standard Vision Transformers flatten 2D images into 1D sequences, disrupting the natural spatial topology. While Rotary Positional Embedding (RoPE) excels in 1D, it inherits this limitation, often treating spatially distant patches (e.g., at row edges) as sequence neighbors. Existing 2D approaches typically treat spatial axes independently, failing to decouple this false sequential proximity from true spatial distance. To restore the 2D spatial manifold, we introduce Geometric Positional Embedding (GeoPE), a framework that extends rotations to 3D Euclidean space using quaternions. To overcome non-commutativity and ensure symmetry, GeoPE constructs a unified rotational operator by computing the geometric mean in the Lie algebra. This creates a geometrically coupled encoding that effectively separates spatial dimensions. Extensive experiments on image classification, object detection, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Digital Image Processing Techniques · Medical Image Segmentation Techniques
