Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-Attention

Giorgio Roffo; Hazem Abdelkawy; Nilli Lavie; Luke Palmer

arXiv:2603.00175·cs.CV·March 31, 2026

Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-Attention

Giorgio Roffo, Hazem Abdelkawy, Nilli Lavie, Luke Palmer

PDF

TL;DR

This paper introduces Infinite Self-Attention (InfSA), a spectral reformulation of self-attention that enables linear-time transformers with high scalability and robustness, demonstrated on vision tasks with significant efficiency gains.

Contribution

The authors propose a novel spectral reformulation called Infinite Self-Attention and a linear-time variant, Linear-InfSA, improving scalability and interpretability of transformers for high-resolution vision tasks.

Findings

01

Linear-InfSA achieves 84.7% top-1 accuracy on ImageNet-1K.

02

Linear-InfSA supports inference on 9216 by 9216 tokens, outperforming previous models.

03

Linear-InfSA runs 13x faster and more energy-efficient than equal-depth ViT.

Abstract

The quadratic cost of softmax attention limits Transformer scalability in high-resolution vision. We introduce Infinite Self-Attention (InfSA), a spectral reformulation that treats each attention layer as a diffusion step on a content-adaptive token graph, accumulating multi-hop interactions through a discounted Neumann series over attention matrices. This links self-attention to classical graph centrality (Katz, PageRank, eigenvector centrality) for interpretable token weighting. We also show the Neumann kernel equals the fundamental matrix of an absorbing Markov chain, so a token's centrality is its expected number of random-walk visits before absorption. We then propose Linear-InfSA, a linear-time variant that approximates the principal eigenvector of the implicit attention operator without forming the full attention matrix. It keeps an auxiliary state of fixed size proportional to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.