WildCat: Near-Linear Attention in Theory and Practice

Tobias Schr\"oder; Lester Mackey

arXiv:2602.10056·cs.LG·February 11, 2026

WildCat: Near-Linear Attention in Theory and Practice

Tobias Schr\"oder, Lester Mackey

PDF

Open Access

TL;DR

WildCat introduces a spectrally-accurate, near-linear time attention approximation method that significantly reduces computational costs while maintaining high accuracy, demonstrated across various tasks.

Contribution

WildCat presents a novel spectral subsampling approach for attention, achieving near-linear complexity with provable error bounds and practical GPU implementation.

Findings

01

Achieves super-polynomial error decay with bounded inputs

02

Runs in near-linear time, significantly faster than quadratic methods

03

Effective across image and language tasks

Abstract

We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length $n$ . WildCat avoids these quadratic costs by only attending over a small weighted coreset. Crucially, we select the coreset using a fast but spectrally-accurate subsampling algorithm -- randomly pivoted Cholesky -- and weight the elements optimally to minimise reconstruction error. Remarkably, given bounded inputs, WildCat approximates exact attention with super-polynomial $O (n^{- l o g (l o g (n))})$ error decay while running in near-linear $O (n^{1 + o (1)})$ time. In contrast, prior practical approximations either lack error guarantees or require quadratic runtime to guarantee such high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Parallel Computing and Optimization Techniques