Attention on the Sphere
Boris Bonev, Max Rietmann, Andrea Paris, Alberto Carpentieri, Thorsten Kurth

TL;DR
This paper introduces a novel spherical attention mechanism for Transformer models, preserving spherical symmetries and improving performance on geophysical and vision tasks involving data on the sphere.
Contribution
It presents a geometrically faithful spherical attention with neighborhood confinement, optimized implementations, and demonstrates superior results across multiple spherical data tasks.
Findings
Outperforms planar Transformers on spherical tasks
Provides rotationally equivariant attention mechanism
Enables scalable and locality-aware spherical modeling
Abstract
We introduce a generalized attention mechanism for spherical domains, enabling Transformer architectures to natively process data defined on the two-dimensional sphere - a critical need in fields such as atmospheric physics, cosmology, and robotics, where preserving spherical symmetries and topology is essential for physical accuracy. By integrating numerical quadrature weights into the attention mechanism, we obtain a geometrically faithful spherical attention that is approximately rotationally equivariant, providing strong inductive biases and leading to better performance than Cartesian approaches. To further enhance both scalability and model performance, we propose neighborhood attention on the sphere, which confines interactions to geodesic neighborhoods. This approach reduces computational complexity and introduces the additional inductive bias for locality, while retaining the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
Topics3D Shape Modeling and Analysis · Model Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Softmax · Position-Wise Feed-Forward Layer
