SLAY: Geometry-Aware Spherical Linearized Attention with Yat-Kernel
Jose Miguel Luna, Taha Bouhsine, Krzysztof Choromanski

TL;DR
SLAY introduces a geometry-aware, linear-time attention mechanism based on spherical Yat-kernels, achieving near-softmax performance with improved scalability and outperformance of prior linear attention methods.
Contribution
It presents a novel spherical Yat-kernel-based attention that is geometry-aware, positive definite, and computationally efficient, closely approximating softmax attention.
Findings
SLAY achieves near-softmax performance.
SLAY outperforms prior linear attention methods.
SLAY maintains linear time and memory complexity.
Abstract
We propose a new class of linear-time attention mechanisms based on a relaxed and computationally efficient formulation of the recently introduced E-Product, often referred to as the Yat-kernel (Bouhsine, 2025). The resulting interactions are geometry-aware and inspired by inverse-square interactions in physics. Our method, Spherical Linearized Attention with Yat Kernels (SLAY), constrains queries and keys to the unit sphere so that attention depends only on angular alignment. Using Bernstein's theorem, we express the spherical Yat-kernel as a nonnegative mixture of polynomial-exponential product kernels and derive a strictly positive random-feature approximation enabling linear-time O(L) attention. We establish positive definiteness and boundedness on the sphere and show that the estimator yields well-defined, nonnegative attention scores. Empirically, SLAY achieves performance that is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis
