WildCat: Near-Linear Attention in Theory and Practice
Tobias Schr\"oder, Lester Mackey

TL;DR
WildCat introduces a spectrally-accurate, near-linear time attention approximation method that significantly reduces computational costs while maintaining high accuracy, demonstrated across various tasks.
Contribution
WildCat presents a novel spectral subsampling approach for attention, achieving near-linear complexity with provable error bounds and practical GPU implementation.
Findings
Achieves super-polynomial error decay with bounded inputs
Runs in near-linear time, significantly faster than quadratic methods
Effective across image and language tasks
Abstract
We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length . WildCat avoids these quadratic costs by only attending over a small weighted coreset. Crucially, we select the coreset using a fast but spectrally-accurate subsampling algorithm -- randomly pivoted Cholesky -- and weight the elements optimally to minimise reconstruction error. Remarkably, given bounded inputs, WildCat approximates exact attention with super-polynomial error decay while running in near-linear time. In contrast, prior practical approximations either lack error guarantees or require quadratic runtime to guarantee such high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Parallel Computing and Optimization Techniques
