Attention in Constant Time: Vashista Sparse Attention for Long-Context Decoding with Exponential Guarantees
Vashista Nobaub

TL;DR
This paper introduces Vashista Sparse Attention, a theoretically grounded method that enables constant-time long-context decoding by focusing attention on a small, stable subset of tokens, leading to speedups with minimal quality loss.
Contribution
The paper provides a formal analysis of entropic attention concentration and proposes Vashista Sparse Attention, a practical, efficient mechanism for long-context decoding with exponential guarantees.
Findings
Stable constant-size support observed in long-context tasks
Significant speedups with minimal quality degradation
Theoretical support-gap criteria for safe sparse decoding
Abstract
Large language models spend most of their inference cost on attention over long contexts, yet empirical behavior suggests that only a small subset of tokens meaningfully contributes to each query. We formalize this phenomenon by modeling attention as a projection onto the convex hull of key vectors and analyzing its entropic (softmax-like) relaxation. Our main theoretical contribution is a face-stability theorem showing that, under a strict complementarity margin (a support gap (\Delta) certified by KKT multipliers), entropic attention concentrates on a constant-size active face: the total mass assigned to inactive tokens decays exponentially as (\exp(-\Omega(\Delta/\varepsilon))), while the error on the active face scales linearly in the temperature/regularization parameter (\varepsilon). This yields a practical criterion for when sparse long-context decoding is safe and provides a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Generative Adversarial Networks and Image Synthesis · Big Data and Digital Economy
