Attention in Constant Time: Vashista Sparse Attention for Long-Context Decoding with Exponential Guarantees

Vashista Nobaub

arXiv:2602.13804·cs.AI·February 17, 2026

Attention in Constant Time: Vashista Sparse Attention for Long-Context Decoding with Exponential Guarantees

Vashista Nobaub

PDF

Open Access

TL;DR

This paper introduces Vashista Sparse Attention, a theoretically grounded method that enables constant-time long-context decoding by focusing attention on a small, stable subset of tokens, leading to speedups with minimal quality loss.

Contribution

The paper provides a formal analysis of entropic attention concentration and proposes Vashista Sparse Attention, a practical, efficient mechanism for long-context decoding with exponential guarantees.

Findings

01

Stable constant-size support observed in long-context tasks

02

Significant speedups with minimal quality degradation

03

Theoretical support-gap criteria for safe sparse decoding

Abstract

Large language models spend most of their inference cost on attention over long contexts, yet empirical behavior suggests that only a small subset of tokens meaningfully contributes to each query. We formalize this phenomenon by modeling attention as a projection onto the convex hull of key vectors and analyzing its entropic (softmax-like) relaxation. Our main theoretical contribution is a face-stability theorem showing that, under a strict complementarity margin (a support gap (\Delta) certified by KKT multipliers), entropic attention concentrates on a constant-size active face: the total mass assigned to inactive tokens decays exponentially as (\exp(-\Omega(\Delta/\varepsilon))), while the error on the active face scales linearly in the temperature/regularization parameter (\varepsilon). This yields a practical criterion for when sparse long-context decoding is safe and provides a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Generative Adversarial Networks and Image Synthesis · Big Data and Digital Economy