SLAY: Geometry-Aware Spherical Linearized Attention with Yat-Kernel

Jose Miguel Luna; Taha Bouhsine; Krzysztof Choromanski

arXiv:2602.04915·cs.LG·February 10, 2026

SLAY: Geometry-Aware Spherical Linearized Attention with Yat-Kernel

Jose Miguel Luna, Taha Bouhsine, Krzysztof Choromanski

PDF

Open Access

TL;DR

SLAY introduces a geometry-aware, linear-time attention mechanism based on spherical Yat-kernels, achieving near-softmax performance with improved scalability and outperformance of prior linear attention methods.

Contribution

It presents a novel spherical Yat-kernel-based attention that is geometry-aware, positive definite, and computationally efficient, closely approximating softmax attention.

Findings

01

SLAY achieves near-softmax performance.

02

SLAY outperforms prior linear attention methods.

03

SLAY maintains linear time and memory complexity.

Abstract

We propose a new class of linear-time attention mechanisms based on a relaxed and computationally efficient formulation of the recently introduced E-Product, often referred to as the Yat-kernel (Bouhsine, 2025). The resulting interactions are geometry-aware and inspired by inverse-square interactions in physics. Our method, Spherical Linearized Attention with Yat Kernels (SLAY), constrains queries and keys to the unit sphere so that attention depends only on angular alignment. Using Bernstein's theorem, we express the spherical Yat-kernel as a nonnegative mixture of polynomial-exponential product kernels and derive a strictly positive random-feature approximation enabling linear-time O(L) attention. We establish positive definiteness and boundedness on the sphere and show that the estimator yields well-defined, nonnegative attention scores. Empirically, SLAY achieves performance that is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis